Real-Estate Management Data Warehouse End-to-End Pipeline

Project Overview

This project involves building a Data Warehouse (DWH) for a Real-Estate Management Company which is then specifically used to evaluate agents' performance based on the transactions managed and carried out by them. The company facilitates property transactions by connecting property owners with potential clients for purchase or rental. The company allocates agents to market properties, arrange viewings, negotiate terms, and carry out the transaction process. It also handles maintenance and repairs of properties on behalf of their owners.

Key Objectives:

Design and implement a relational database schema.
Design and implement a data warehouse based on the star schema model.
Create an ETL pipeline to populate the data warehouse.
Answer analytical queries through Star Schema.
Visualize the data using Power BI and automate dashboard updates.

Project Stages:

Project Flow:

Repository Structure

The repository contains the following folders:

Database
- Datasets - Contains CSV files for each entity.
- DDL Queries.sql - SQL script to create database tables.
- Documentation.pdf - Documentation for database schema.
- ERD.png - Entity-Relationship Diagram (ERD).
DWH Dimensional Modelling
- Star Schema Blueprint.xlsx - Blueprint for the star schema.
E2E DWH Pipeline
- E2EPipelineExec.ipynb - Jupyter Notebook demonstrating the complete pipeline execution.
- Dashboard Snips - Snapshots of the Power BI dashboard.
- Pipeline_Support - Contains Python scripts for Data Generation, Dimensional Queries, ETL Support Functions, ETL Master Function, Fact Snapshot Creation, and Fact Snapshot Uploading.
Power BI Desktop Template
- Agent Performance Dashboard.pbix - Power BI desktop template for the dashboard.
Pabbly Workflow Snippet
- Shows the setup snippet for Pabbly Connect Workflow.

Data Generation

The data generation script uses Mockaroo to create synthetic data for the database.

Run the data generation script:
```
python %run Pipeline_Support/DataGen.py
```
This will create CSV files in the Database/Datasets folder.

ETL Process

The ETL process extracts data from the OLTP database, transforms it, and loads it into the data warehouse.

The ETL pipeline will: - Fetch datasets from the Datasets folder. - Treat missing values. - Correct data types. - Create the star schema dimensions and fact table. - Upload the fact table snapshot to GitHub.

Data Warehouse

The star schema includes the following dimensions and fact table:

Dimensions:
- Date
- Location
- Agent
- PropertyDetails
- Listing
Fact Table:
- Transaction facts such as MaintenanceExp, AskedAmount, TransactionValue, CommissionRate, CommissionValue, NegotiationDays, ClosingDays.

Dashboard and Visualization

The Power BI dashboard visualizes the data from the fact table snapshot.

Connect Power BI Desktop to the fact table snapshot:
- Get data > Web > "(https://raw.githubusercontent.com/hase3b/End-to-End-DWH-Pipeline/main/E2E%20DWH%20Pipeline/Fact%20Table%20Snapshot/FactSnapshot.csv)"
Design the dashboard and publish it to web.
The dashboard URL: "(https://app.powerbi.com/view?r=eyJrIjoiZjYyOWQxMWItMGFmNi00M2QyLWIzYWItMDYxOTc3ZjBmNmYwIiwidCI6ImZlZTNiOTE2LTAxYzEtNDk4Ny1hNjQ2LWUxOTM0MzJiOWVhYSIsImMiOjl9)"

Automation with Pabbly Connect

To automate dashboard updates:

Set up a workflow in Pabbly Connect:
- Trigger: GitHub commit.
- Action: Refresh Power BI dashboard dataset.
Every time the fact table snapshot is updated on GitHub, the Power BI dashboard will refresh automatically.

Future Updates

For updating the data warehouse with new data:

Generate new data for the desired year.
Append the new data to the existing datasets.
Modify the start and end dates in the ETL master function script.
Re-run the pipeline from start.

By following these steps, the fact table snapshot and the Power BI dashboard will be updated with the new data.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
DWH Dimensional Modelling		DWH Dimensional Modelling
Database		Database
E2E DWH Pipeline		E2E DWH Pipeline
PowerBI Desktop Template		PowerBI Desktop Template
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Estate Management Data Warehouse End-to-End Pipeline

Table of Contents

Project Overview

Repository Structure

Data Generation

ETL Process

Data Warehouse

Dashboard and Visualization

Automation with Pabbly Connect

Future Updates

About

Releases

Packages

Languages

hase3b/End-to-End-DWH-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Real-Estate Management Data Warehouse End-to-End Pipeline

Table of Contents

Project Overview

Repository Structure

Data Generation

ETL Process

Data Warehouse

Dashboard and Visualization

Automation with Pabbly Connect

Future Updates

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages