Coronary Heart disease prediction in the USA and factors that favor its occurrence

Gachanja, Jeremy Kibiru
Journal Title
Journal ISSN
Volume Title
Strathmore University
Coronary Heart Disease (CHD) is the leading cause of deaths in adults in Europe ~md North America (WHO, 2017) . Early detection and treatment of this disease is thus a matter of life and death (Gonsalves, Thabtah, Mohammad, & Singh, 2019). This project has compared the predictive power of five machine learning algorithms namely: Support Vector Machine, Naive Bayes, Logistic Regression, Decision Trees and Neural Networks, in predicting this disease. The objective of this study was to determine which of the five algorithms was best suited for CHD prediction and what level of the CHD risk factors favored the occmrence of CHD. This study had fourteen CHD risk factors that is gender, age, smoking habit, number of cigarettes smoked, use of blood pressure medication, prevalent stroke, prevalent hypetiension, diabetes, total cholesterol, diastolic and systolic blood pressure, BMI, heart rate, and education. However, this study found that only age, systolic and diastolic blood pressure, prevalent hypertension, blood pressure medication and diabetes had a significant correlation with CHD occurrence. This study used these seven CHD risk factors to model CHD occurrence in the five algorithms. This study found that the logistic regression was best suited for predicting CHD, followed by Naive Bayes then Decision Tree and lastly SVM and Neural Networks. This work found that CHD positive individuals had high cholesterol (235mm on average), high blood sugar (a maximum of 394mm), had a smoking habit (10.82 cigarettes per day on average), were obese (overweight BMI of 26.63 on average) and had high blood pressure (a maximwn of 295/140 Mm Hg and 143/86 Mm Hg on average
Submitted in partial fulfillment of the requirements for the Degree of Financial Economics at Strathmore University