The DHIS2 Analytics Pipeline automates the extraction, transformation, and analysis of health data from a DHIS2 instance. This project focuses on retrieving facility-level weekly data, enriching it with organisational unit details, and exporting the processed data in a structured format for further reporting and visualisation.
- Automated Data Extraction: Queries DHIS2 API to fetch organisational units at facility level.
- Facility-Level Aggregation: Aggregates health data weekly.
- CSV and Postgres Export: Outputs processed data to CSV files and writes to Postgres for easy access and analysis.
- Clone the repository:
git clone https://github.com/malambomutila/DHIS2-Pipeline.git
- Navigate to the project directory:
cd DHIS2-Pipeline
- Install required dependencies but note that the requirements file contains other additional packages that were used to run Airflow so you might not need them and they might conflict with your Airflow setup. Install only the libraries used in the script.
pip install -r requirements.txt
- Add the script to your Airflow DAGS folder and run it from therequerydata.
- Processed data will be saved your specificied irectory as CSV files and also written to your Postgres database.
The following CSV files are generated by the pipeline:
- data_v1.csv: Weekly facility-level data for period defined.
Contributions are welcome! Feel free to submit issues, feature requests, or pull requests to improve the project.