Determining the optimal machine learning algorithm to predict pre-term birth maternal health using electronic health records in Kenya

dc.contributor.authorWaswa, P. W.
dc.date.accessioned2026-04-14T09:13:25Z
dc.date.issued2025
dc.descriptionFull - text thesis
dc.description.abstractPreterm birth (PTB) remains a leading cause of neonatal morbidity and mortality globally, with the burden disproportionately higher in low- and middle-income countries (LMICs). Despite advances in maternal healthcare, the early prediction of PTB remains challenging due to the multifactorial nature of its causes and limitations in traditional risk assessment tools. This study sought to develop and evaluate machine learning models for PTB prediction using routine maternal clinical data in a low-resource setting. This study employed a quantitative, retrospective cohort study design using routinely collected maternal health data from a selected health facility in Kenya. Clinical data were retrospectively extracted from a Level 4 healthcare facility in Bungoma County, Kenya. The dataset included demographic, obstetric, and vital sign parameters collected during delivery between 2023 - 2025. After data cleaning, handling missing values, and addressing class imbalance using SMOTE, several machine learning models were trained and tested. These included Logistic Regression, Support Vector Machines (SVM), Random Forests, and gradient boosting models such as XGBoost, LightGBM, CatBoost, and AdaBoost. Among the models, CatBoost demonstrated the most balanced performance with an accuracy of 0.582, recall of 0.693, F1-score of 0.591, and the highest AUC of 0.608. AdaBoost achieved the highest sensitivity (recall of 0.789) but had a lower overall accuracy (0.547). XGBoost and LightGBM also performed moderately well with AUCs of 0.607 and 0.606 respectively. Feature importance analysis revealed fundal height, temperature, and respiratory rate as the most influential predictors. SHAP analysis confirmed the non-linear and interactive contributions of these features to model predictions. While the results show potential for ML-based risk stratification tools, the predictive performance of the models remains modest when compared to thresholds reported in literature (AUC greater than 0.70) for high income countries. However in LMICs the threshholds have been reported lower from 0.6161 which is consistent with our study findings. These limitations are likely due to the absence of longitudinal data such as ANC, nutrition data and demographic data that has been reported to be key in pre-term birth prediction. Nevertheless, the study underscores the feasibility of deploying explainable ML models in low-resource settings and highlights the need for data quality improvements, multi-site validation, and incorporation of additional clinical features to improve prediction accuracy.
dc.identifier.citationWaswa, P. W. (2025). Determining the optimal machine learning algorithm to predict pre-term birth maternal health using electronic health records in Kenya [Strathmore University]. https://hdl.handle.net/11071/16383
dc.identifier.urihttps://hdl.handle.net/11071/16383
dc.language.isoen
dc.publisherStrathmore University
dc.titleDetermining the optimal machine learning algorithm to predict pre-term birth maternal health using electronic health records in Kenya
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Determining the optimal machine learning algorithm to predict pre-term birth maternal health using electronic health records in Kenya.pdf
Size:
2.82 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: