Patient state construction from clinical databases for machine learning.
See FeaturesConstructed.md for a list of extracted and constructed features.
Download this package. You will need to set the database connections to your own database structure.
Recommended tables are:
ICUpatients: contains patient ids, ICU admit dates, and ICU discharge dates.
Demographics: contains demographic info (age, sex, height, weight, bmi, race)
Laboratory: contains laboratory test results
Vital Signs: contains vital sign measurements. Might be combined with laboratory data.
Ventilator settings: contains ventilator settings. Might be combined with vitals or laboratory data.
Medications: contains medication order data. Might have a separate table for Home Medications.
Micro biology: contains micro biology data.
Procedures: contains surgical procedures.
Intake and output: contains the intake and output data.
Other useful tables:
A mapping table that maps clinical events, such a glucose test results from different hospitals, together.
A mapping table that maps clinical event codes to human readable names, table membership, group membership, and data type (discrete, interval, binary).
A mapping table that maps discrete clinical event results to boolean values.
PatientPyFeatureSelection (https://github.com/ajk77/PatientPyFeatureSelection)
RegressiveImputer (https://github.com/ajk77/RegressiveImputer)
(Later versions may add peewee for database connectivity).
Populate ICUpatients table with only the patients of interest, i.e., patients after selection for location, date, and diagnoses.
Add database connections.
In patient_pickler.py and create_feature_vectors.py, set: root_dir, pkl_dir, and case_day_filename.
In create_feature_vectors.py, set: feature_dir and parameters for load_labeled_cases().
Create directory structure for pkl_dir: create pkl_dir folder and sub folders ('root_data/', 'flag_data/', 'med_data/','procedure_data/', 'micro_data/', 'io_data/',' demo_data/').
Create directory structure for feature_dir: create feature_dir folder and sub folders ('root_data/', 'med_data/', 'procedure_data/', 'micro_data/', 'io_data/', 'demo_data/').
Must create labeled_case_list file and linked participant_info files. See resource folder for examples.
Labeled case list file lists the exact cases of interest. Participant info files provide length of stay cut times.
Run patient_pickler.py once.
Run create_feature_vectors.py once for each desired patient set, updating feature_dir and load_labeled_cases() parameters each time.
Run assemble_feature_matrix.py once for each directory filled by create_feature_vecotrs.py.
Run InstantiateExperimentDriver.py; this can be run multiple times on each assembled feature matrix. It is where set folds, imputation, feature selection, and machine learning occur.
Version 3.0. For the versions available, see https://github.com/ajk77/patientpy
- Andrew J King - Doctoral Candidate (at time of creation)
- Website (https://www.andrewjking.com/)
- Twitter (https://twitter.com/andrewsjourney)
- Shyam Visweswaran - Principal Investigator
- Website (http://www.thevislab.com/)
- Twitter (https://twitter.com/Shyam_Vis)
- Gregory F Cooper - Doctoral Advisor
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
- Harry Hochheiser
- Twitter (https://twitter.com/hshoch)
- Gilles Clermont
- Milos Hauskrecht