PTSD Methylation Risk Score 2024

Code in this repository reflect the steps used to pre-process data for machine learning and train models to predict PTSD and create a methylation risk score, published in BMC Medical Genomics. Also provided are the final weights and features for each of the three published risk scores.

Main Contributor

Agaz Wani

Files:

Key Instruction: The weights and features for each of the three published risk scores are located in the Data folder. The files are named as follows:

eMRS_Model1.xlsx
MoRS_Model2.xlsx
MoRSAE_Model3.xlsx

Helper functions

install_needed_packages.R install required packages.
DNHS_more_pheno.Rget the required variables for DNHS.
Smoking_Scores_PGC_cohorts.R estimates smoking scores for each individual discovery cohort.
MRS_Preprocess.R pre-process Marine Resilience Study cohort to include in the training.
Armystarrs_and_PRISMO_preprocess.R to pre-process Army STARRS and PRISMO cohorts pre-post deployment samples to test risk scores.
Check_after_updating_pheno.R and Check_after_updating_pheno.html to check the updated phenotype file with the old file.
cpgassoc2.R helper function to perform association analysis between each CpG and PTSD.
Covariate_adjustment_1.R example code to show covariate adjustment. paper as we thought to make it an Epic data paper.
Compare_Effect_Sizes.Rmd and Compare_Effect_Sizes.html code to compare the effect sizes of discovery and Boston VA cohort for model1.
Demographics.R code to get demographic information for the manuscript.
Cohort_Information.Rmd and Cohort_Information.html code to get summary information from different cohorts, e.g., variables in each cohort to check data availability.

Python code to train and test models:

makedirectory.py Is to make a directory to store the outcome files from each run.
Settings.ipynb contains settings for packages and plots.
Preprocess_data_updated_1.ipynb preprocess all cohorts individually for machine learning.
pre_post_trauma_processing_v1.ipynb Is to pre-process the cohorts with pre/post samples and choose post-trauma samples for machine learning.
Imputation_Covariate_adjustment_2.1.ipyn Code to perform imputation and covariate adjustment.
Imputation_Covariate_adjustment_including_Expo_vaiables_2.1.ipynb Code to perform imputation and covariate adjustment, including exposure variables.
Feature_Selection_and_training_on_ptsdpm_3.3.ipynb Feature selection using the covariate-adjusted data (output of step 3).
Feature_Selection_and_training_on_ptsdpm_wd_exp_vars_adjustment_3.3.ipynb Feature selection using the covariate-adjusted data for exposure variables (input is step 4 output).
model_performance_5.5.ipynb Running model and evaluating the performance (input is step 5 output).
model_performance_wd_exp_vaars_adjustment_5.5.ipynb Running model and evaluating the performance with adjusted exposure variables (input is step 6 output).

R code for downstream analysis:

downstream_analysis_v5.qmd To estimate risk scores for model 1 and 2 and test the risk scores using the test set in discovery cohorts. downstream_analysis_v5.html is the generated report. In steps 2 and 3, we test various data sets such as test set, civilians, military, and males and females to look at various scenarios.
downstream_analysis_adj_for_Exp_Vars_v5.qmd is to estimate and test risk scores using model 3 on the test data set. downstream_analysis_adj_for_Exp_Vars_v5.html is the generated report.
Test_RiskScores_with&without_exp_vars_wd_logit_6.Rmd is a clean version of estimating and testing risk scores. It used the point-biserial correlation between binary and continuous variables. Also, we used the logit model to predict PTSD using risk scores. Test_RiskScores_with&without_exp_vars_wd_logit_6.html is the generated report. This file was used to generate density, distribution and correlation plots for discovery cohorts.
Pre_Post_Deployment_eMRS.qmd and Pre_Post_Deployment_eMRS.html to test risk scores pre and post-deployment.
Enrichment_analysis_1.qmd to perform enrichment analysis of top CpGs from models 1, 2, and 3. Models 1 and 2 have the same set of CpGs.
CpGs_in_previous_studies&ML.R code to find overlap between identified significant CpGs and previous studies.
Overlap_between_MRS_CpGs_metaanalysis_CpGs_Freeze3.R to check overlap between identified significant CpGs and PGC EWAS meta-analysis and Freeze3 genes.
mQTL.qmdand mQTL.html Comparing significant CpGs with BIOS QTL browser CpGs.

Code for external cohorts is in R/Independent_Cohort:

Create_sample_data.R and Create_sample_data without exp vars.R code to create sample data with and without exposure variables as an example for external cohorts.
Covariate_Adj_RiskScores_1.R and Covariate_Adj_RiskScores_without_exp_vars_1.R code to estimate risk scores with and without exposure variables, respectively.
Test_RiskScores_with&without_exp_vars_wd_logit_2.Rmd code to test risk scores and generate plots.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Data		Data
Python		Python
R		R
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
PGCML.Rproj		PGCML.Rproj
README.md		README.md
Simulation_Out.Rdata		Simulation_Out.Rdata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTSD Methylation Risk Score 2024

Main Contributor

Files:

Key Instruction: The weights and features for each of the three published risk scores are located in the Data folder. The files are named as follows:

Helper functions

Python code to train and test models:

R code for downstream analysis:

Code for external cohorts is in R/Independent_Cohort:

About

Releases

Packages

Contributors 3

Languages

PGC-PTSD-EWAS/PTSD-Methylation-Risk-Score-2024

Folders and files

Latest commit

History

Repository files navigation

PTSD Methylation Risk Score 2024

Main Contributor

Files:

Key Instruction: The weights and features for each of the three published risk scores are located in the Data folder. The files are named as follows:

Helper functions

Python code to train and test models:

R code for downstream analysis:

Code for external cohorts is in R/Independent_Cohort:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages