- Project Overview
- Repository Structure
- Data Generation
- ETL Process
- Data Warehouse
- Dashboard and Visualization
- Automation with Pabbly Connect
- Future Updates
This project involves building a Data Warehouse (DWH) for a Real-Estate Management Company which is then specifically used to evaluate agents' performance based on the transactions managed and carried out by them. The company facilitates property transactions by connecting property owners with potential clients for purchase or rental. The company allocates agents to market properties, arrange viewings, negotiate terms, and carry out the transaction process. It also handles maintenance and repairs of properties on behalf of their owners.
Key Objectives:
- Design and implement a relational database schema.
- Design and implement a data warehouse based on the star schema model.
- Create an ETL pipeline to populate the data warehouse.
- Answer analytical queries through Star Schema.
- Visualize the data using Power BI and automate dashboard updates.
Project Stages:
Project Flow:
The repository contains the following folders:
-
Database
Datasets
- Contains CSV files for each entity.DDL Queries.sql
- SQL script to create database tables.Documentation.pdf
- Documentation for database schema.ERD.png
- Entity-Relationship Diagram (ERD).
-
DWH Dimensional Modelling
Star Schema Blueprint.xlsx
- Blueprint for the star schema.
-
E2E DWH Pipeline
E2EPipelineExec.ipynb
- Jupyter Notebook demonstrating the complete pipeline execution.Dashboard Snips
- Snapshots of the Power BI dashboard.Pipeline_Support
- Contains Python scripts for Data Generation, Dimensional Queries, ETL Support Functions, ETL Master Function, Fact Snapshot Creation, and Fact Snapshot Uploading.
-
Power BI Desktop Template
Agent Performance Dashboard.pbix
- Power BI desktop template for the dashboard.
-
Pabbly Workflow Snippet
- Shows the setup snippet for Pabbly Connect Workflow.
The data generation script uses Mockaroo to create synthetic data for the database.
-
Run the data generation script:
python %run Pipeline_Support/DataGen.py
-
This will create CSV files in the
Database/Datasets
folder.
The ETL process extracts data from the OLTP database, transforms it, and loads it into the data warehouse.
The ETL pipeline will:
- Fetch datasets from the Datasets
folder.
- Treat missing values.
- Correct data types.
- Create the star schema dimensions and fact table.
- Upload the fact table snapshot to GitHub.
The star schema includes the following dimensions and fact table:
-
Dimensions:
- Date
- Location
- Agent
- PropertyDetails
- Listing
-
Fact Table:
- Transaction facts such as MaintenanceExp, AskedAmount, TransactionValue, CommissionRate, CommissionValue, NegotiationDays, ClosingDays.
The Power BI dashboard visualizes the data from the fact table snapshot.
-
Connect Power BI Desktop to the fact table snapshot:
-
Design the dashboard and publish it to web.
-
The dashboard URL: "(https://app.powerbi.com/view?r=eyJrIjoiZjYyOWQxMWItMGFmNi00M2QyLWIzYWItMDYxOTc3ZjBmNmYwIiwidCI6ImZlZTNiOTE2LTAxYzEtNDk4Ny1hNjQ2LWUxOTM0MzJiOWVhYSIsImMiOjl9)"
To automate dashboard updates:
-
Set up a workflow in Pabbly Connect:
- Trigger: GitHub commit.
- Action: Refresh Power BI dashboard dataset.
-
Every time the fact table snapshot is updated on GitHub, the Power BI dashboard will refresh automatically.
For updating the data warehouse with new data:
- Generate new data for the desired year.
- Append the new data to the existing datasets.
- Modify the start and end dates in the ETL master function script.
- Re-run the pipeline from start.
By following these steps, the fact table snapshot and the Power BI dashboard will be updated with the new data.