Heart Disease Classification

Cardiovascular disease or heart disease is the leading cause of death amongst women and men and amongst most racial/ethnic groups in the United States. Heart disease describes a range of conditions that affect your heart. Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease. From the CDC, roughly every 1 in 4 deaths each year are due to heart disease. The WHO states that human life style is the main reason behind this heart problem. Apart from this there are many key factors which warns that the person may/may not getting chance of heart disease.

Using UCI’s heart disease dataset found here, I build classification models and used ensemble methods to produce a highly accurate model. In the current dataset, publications of this study included 303 patients and chose 14 out of 76 features that are relevant in predicting heart disease.

Variable Descriptions

age: age in years
sex: sex (1 = male; 0 = female)
cp: chest pain type — Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic
trestbps: resting blood pressure (in mm Hg on admission to the hospital)
chol: serum cholestoral in mg/dl
fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
restecg: resting electrocardiographic results- Value 0: normal, Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), Value 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria
thalach: maximum heart rate achieved
exang: exercise induced angina (1 = yes; 0 = no)
oldpeak = ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment- Value 1: upsloping, Value 2: flat, Value 3: downsloping
ca: number of major vessels (0–3) colored by flourosopy
thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
target: 1 = disease, 0 = no disease

Variable Types

Continuous — age, trestbps, chol, thalach, oldpeak
Binary — sex, fbs, exang, target
Categorical — cp, restecg, slope, ca, thal

Exploratory Data Analysis (code found in 1. EDA.ipynb)

There is a slight class imbalance, but not severe enough to require upsampling/downsampling methods.

Now let’s see how fasting blood sugar (fbs) varies amongst the target variable. Fbs is a diabetes indicator with fbs >120 mg/d is considered diabetic (True):

Here, we see that the number for class true, is lower compared to class false. This provides an indication that fbs might not be a strong feature differentiating between heart disease an non-disease patient.

Next, we'll look at ca which is the number of major blood vessels (0-3) colored by a flourosopy.

We can clearly see that a large proportion of those with heart disease were likely to have 0 major blood vessels to the heart.

Classification (code found in 2. Classification.ipynb

A simple dummy classifier resulted in an accuracy score of 49.18%.

Next I took different machine learning models to find which algorithm has the best accuracy score.

Then I selected the top 3 performing classification models (SVC,Logistic Regression, and Naive Bayes)and inputed them into a StackingCVClassifier. This resulted in an accuracy score of ~92%, outperforming all individual classification models.

Variable ca was the most important feature in predicting the target variable

Grid Search + Ensemble

Using grid search methods on all individual classification models then utilizing the Stacking CVClassifier resulted in the same ensemble accuracy without using grid search.

Summary

From this classification project, we can see that using a stacking cv classifier improves the overall accuracy compared to using individual classification models. Our base model performed at ~49% while our ensemble model performed at ~92% accuracy. Using the grid search plus ensemble methods did not improve the overall accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
Images		Images
.DS_Store		.DS_Store
1. EDA.ipynb		1. EDA.ipynb
2. Classification.ipynb		2. Classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Classification

Exploratory Data Analysis (code found in 1. EDA.ipynb)

Classification (code found in 2. Classification.ipynb

Grid Search + Ensemble

Summary

About

Releases

Packages

Languages

mkosaka1/HeartDisease_Classification

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Classification

Exploratory Data Analysis (code found in 1. EDA.ipynb)

Classification (code found in 2. Classification.ipynb

Grid Search + Ensemble

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages