MSc.SS Theses and Dissertations (2021)
Permanent URI for this collection
Browse
Browsing MSc.SS Theses and Dissertations (2021) by Title
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- ItemClassification of X-rays images using Deep Convolutional Neural Network: COVID-19(Strathmore University, 2021) Bore, Laban KipchirchirThe increased amount of labeled X-ray image archives has triggered increased research work in the application of statistics, machine learning, deep learning, and computer vision across the different domains. The fresh studies on the application of deep transfer learning (60) CNN to detect and classify few COVID-19 datasets have had major success. COVID-19 dataset has been collected since the outbreak of the COVID-19 viruses in quarter four of 2019. COVID-19 virus confused the diagnosis, treatment, and care of patients because there is no cure and the virus mutates into different fatal variants. This has led to thousands of people dying, increased admission into hospital beds, ICU, and other health facilities. Hundreds of thousands of new infection cases are reported daily across the world. The overburdening of the health system by the COVID-19 virus has caused access to other health services difficult in the under-served world (89). Traditionally, medical doctors carry several tests such as full blood count tests to ascertain if the body is fighting certain pathogens, sputum tests, and chest X-rays. Doctors will examine patients' medical history, carry physical exams such as listening to the lungs with astethoscope for abnormal crackling sounds. The success of this traditional diagnosis process is dependants on the doctors' experience, skills. quality of X-ray images and the availability of patient's historical records. This is almost unattainable and unsustainable in the under-served countries in Africa. The motivation of this paper is to complement the traditional diagnosis and analysis of chest X-ray images by introducing machine classification approaches and state-of-the-art deep residual network ResNet18 (14, 35). According to WHO (58), diagnosis is a process and requires classification steps to inform research, health policies, and care of the patients. An alternative definition is a \pre-existing set of categories agreed upon by the medical profession to designate a specific condition" (43). We applied statistical learning model to separate and classify all the X-Rays images with patchy areas into one distinct class for further research, examination, analysis, and care of the patients. The observed white patchy areas in our X-Rays images was our statistical variables of interest in classifying Chest X-Rays images into COVID-19 and non-COVID-19, pg 3.2. In addition, the final model can be replicated in other non-covid datasets and extended to other related classification tasks. Deep CNN classification model(ResNet18) as a subfield of non-parametric statistics was used for classifying and predicting COVID-19 positive images. The datasets used were COVID-19 positive (184 cases) and the COVID-19 negative cases (5000) were aggregated from different sources. The COVID-19 negative cases was from 10 disease categories (Pneumonia, Pneumothorax, Lung opacity, Fracture, Atelectasis, Edema, pleural, etc). The finetuned deep CNN model (ResNet18) performed significantly with precision (87.5%), sensitivity (75%) and specificity (99.8%). Rerunning the model using larger datasets by adding noise through data augmentation demonstrated sensitivity (90%) and specificity (100%). Hence, when more dataset is fed into the neural model, the classification performance such as precision, AUC and recall improves significantly. This classification model can be used to aid radiologists or medical practitioners in chest X-ray image diagnosis and treatment (59) by categorization, diagnosis, detection, and prediction. Further extension of this research work will focus on using larger COVID-19 or non-COVID-19 datasets with more focus on systematic review around data acquisition, data certification, model development and pitfalls, and explanation construction (39).
- ItemForecasting of the inflation rates in Kenya: a comparison of ANN, ARIMA and SARIMA(Strathmore University, 2021) Kogei, Victor KipronoMonetary policies like price stability are regulated by the Central Bank of Kenya (CBK). Price stability is a key indicator of stable and predictable inflation. Accuracy and reliability in forecasting the inflation rates or predicting its trend correctly are very essential to investors, academia and policymakers. This call for the need to have models with an accurate prediction of the inflation rates to spur investment and economic growth. The use of an intelligence-based model has been found to be robust in forecasting financial and economic series like inflation rates and stock prices. This research, therefore, employs the use of the artificial neural network to forecast the inflation rates in Kenya and compared its performance with statistical models ARIMA and SARIMA. The artificial neural network models emulate the information processing capabilities of neurons of the human brain, thus making them flexible to map input and output well. A major advantage of ANNs is its ability to capture linear and non-linear data due to lack of assumptions, unlike statistical models. The inflation rates data, Gross domestic product (GDP) and exchange rates were the variables used. The variables are monthly data from January 2012 to February 2021. The prediction performances of the three models were evaluated through RMSE, MAE and MAPE. The results obtained show that artificial neural networks outperformed ARIMA and SARIMA models. The implication is that the government can adopt an artificial neural network for forecasting inflation rates in Kenya.
- ItemA Joint modelling approach of monthly anthropometry and time to death among hospitalized severe malnourished children in Kenya(Strathmore University, 2021) Maronga, Christopher SianyoBackground: In follow up studies, interest often lies in understanding the association between biomarkers measured over time and a time-to-event outcome. For this, a two-stage separate analysis or the use of time-dependent Cox models are often used. The former approach does not account for shared features between the two processes while the latter ignores the indigeneity in the biomarker, resulting in inefficient and biased estimates. The objective of this project was to _x joint models on longitudinal anthropometry and time to death among children hospitalized with complicated SAM in four hospitals in Kenya. Methods: Data from a randomised placebo-controlled trial for 1,778 children aged 2 to 59 months admitted to hospital with complicated Severe Acute Malnutrition (SAM) but without HIV was analysed. We used Linear mixed effects models to model longitudinal anthropometry and Cox proportional hazards model to assess the effect of a priori selected baseline covariates on mortality. The two models were linked through current value and slope association to create a joint model used to study the effect of longitudinal anthropometry on risk of death. Results: The joint model results showed that a unit centimetre gain in monthly midupper arm circumference (MUAC) was associated with 46.8% reduction in hazard of death, 0.532(95% CI: 0.476-0.596), while a unit gain in standard deviation (SD) for weight-forheight WHZ) was associated with 37.1% reduction in the risk of death, 0.629(95% CI:0.579- 0.683). A unit gain in SD for monthly weight-for-age (WAZ) and height-for-age (HAZ) was associated with 21.2%, 0.788(95% CI: 0.742-0.837) and 2.5%, 0.227(95% C.I: 0.008 - 6.556) reduction in risk of mortality respectively. Conclusion: In studying the relationship between survival outcome and covariates, researchers often use baseline values of the covariates which fails to account for the interdependencies. Using joint modelling framework, we quantified the association between four longitudinal anthropometry and risk of death. Through current value and slope association, MUAC and WHZ have the strongest association with risk of death respectively hence are better metrics and can be used to screen and identify high-risk children.
- ItemPredictive modeling of Logistics Performance Index using Sparse Regression Models(Strathmore University, 2021) Odok, Eric OyengaThe Logistics Performance Index (LPI), developed by The World Bank, is the only interactive benchmarking tool countries use to identify challenges and opportunities in trade logistics. It was developed using Principal Component Analysis and is a mean average of severely correlated variable scores; this poses two major problems: the susceptibility to outliers of mean computed measures and multicollinearity in prediction leading to overfitting. It is therefore critical to choose prediction techniques carefully. Regression is one of the many techniques, which can reliably predict the correct LPI. This paper accessed four regression models through median computed LPI, which is less vulnerable to outliers; the multiple linear regression model (MLRM), ridge regression model, elastic net model and LASSO model. The first observation was that mean and median computed LPIβs were not different; in prediction, they both overfitted in the test data. Mean computed LPI, however, overfitted more than median. MLRM used all six variables to produce the best fit to the training set (π πππΈ = 0.0497, π΄πΌπΆ = β 952), however, tested on unseen data, it was the least precise (π πππΈ = 0.0438). On the other hand, LASSO did not fit the training set well (π πππΈ = 0.3627, π΄πΌπΆ = β318) but was the most precise predictive model (π πππΈ = 0.0436). LASSO, through variable shrinkage and selection, eliminated one irrelevant variable, timeliness. The two models were not significantly different (P = 0.2951, at 95% CI); the value addition through LASSO was parsimony. While MLRM used all six variables, LASSO used five to generate similar models. Policymakers could reliably use the top three variables that explained 80% of the variability in the model: logistics quality, infrastructure and tracking. Improving the physical infrastructure, increasing logistics management skills, and implementing intelligent technologies could improve trade competitiveness.
- ItemRobust statistical learning for optimal classification of imbalanced data(Strathmore University, 2021) Juma, Samuel WanyonyiNeurobiological disorders such as Learning Disabilities (LD) are increasing becoming a major concern in education and health sectors, hence, precise identification of these disorders is critical. While neuropsychological assessments play an important role in diagnosis, there is limited conventional methodologies for test administration, scoring and interpretation of results. Consequently, there is frequent misclassification of children due to imprecise distinction between children with learning disabilities and those with learning difficulties. This research sought to apply statistical and Machine Learning (ML) approaches to strengthen the LD diagnostic process. This research addresses the challenges of learning from imbalanced data, a characteristic often associated with LD data due to low prevalence of the disorder. Imbalanced data poses a challenge in designing efficient ML solutions since standard classification models assumes fairly distributed classes. The study used experimental design to identify a suitable base learner, and corrective technique to tackle the challenge of imbalanced data. Statistical experiments performed were based on secondary data obtained from a Baseline Survey on Learning Disabilities conducted by Kenya Institute of Special Education in 2019. It was found that Support Vector Machine (SVM) is the best base learner for imbalanced data with the highest classification efficiency compared to other classification models. For data with high dimensionality, it was found that the classification power of Artificial Neural Network (ANN) was better than that of SVM despite the need for significantly higher computational effort. When data dimensionality is reduced, it was observed that classification power of ANN reduces significantly. SVM was also found to be a more flexible model whose classification power is least affected by changes in data dimensionality. It was found that both Adaptive Boosting (AdaBoost) and Adaptive Synthetic Sampling (ADASYN) equally perform well in tackling the imbalanced data, with AdaBoost performing slightly better, although the difference was not statistically significant. The study concludes that SVM and ANN can be used to model highly imbalanced data to achieve the highest classification accuracy with respect to the minority class. ADASYN and AdaBoost methods can be used jointly to build a more robust corrective algorithm to tackle highly imbalanced data.
- ItemUsing semi-Markov process to model incremental change in HIV staging with cost effect(Strathmore University, 2021) Andrew, Joram MaluluiOver the past years, parametric and non-parametric methods have been used in modelling cost and effectiveness according to one studied event or one health state. In this study we used semi-Markov model in which the distributions of sojourn times are explicitly defined. Weibull distribution was chosen and used in modelling the hazard function for each transition. Using a regression model for cost, a cumulative cost function of cost was developed enabling us to determine the estimated mean cost per patient in each state defined in the semi-Markov model. ICER was used for cost effectiveness analysis in comparing two strategies (Patients in DCM and patients not in DCM) of follow up. Using viral load, three states were defined; V L < 200ml, 200ml < V L < 1000ml, V L > 10000ml and an absorbing state death. The mean cost of the patients for each state 1, 2 and 3 was $765, $829 and $1395 respectively. The calculated ICER ratio was $483.8268/life-year-saved. The cost of keeping patients in state 1 (on DCM) was relatively cheaper and efficient compared to the other states.