MSc.SS Theses and Dissertations
Permanent URI for this community
Browse
Browsing MSc.SS Theses and Dissertations by Title
Now showing 1 - 20 of 32
Results Per Page
Sort Options
- ItemA Comparative study of Hybrid Neural Network and ARIMA Models with application to forecasting intra-day child-line calls in Kenya(Strathmore University, 2022) Wang’ombe, Grace WairimuBackground: For successful staffing and recruiting of call centre professionals, precise forecasting of the number of calls arriving at the centre is crucial. These projections are needed for various periods, both short and long-term. Benchmark time series models such as ARIMA and Holt-Winters used in forecasting call centre data are outperformed in long term forecasts, especially when the data is not stationary. Advanced models such as the ANNs can pick up on the random peaks or outlying periods better than the benchmark time–series models. The hybrid methodology combines the strengths of the benchmark time–series and advanced models, thus improving overall forecasts. Objective: The study’s primary goal was to assess the superiority of a Hybrid ARIMAANN model over its constituent models in forecasting Childline call centre data in Kenya. Methods: The ARIMA, ANN and hybrid ARIMA-ANN models were used in the call centre data forecasting. The cross-validation technique was used to create forecasting accuracy metrics which are then compared. In ARIMA, the Box-Jenkins methodology is used to fit the model whereas the neural network element of the hybrid model and the ANN were modelled using the feed-forward Neural Network Autoregressive(NNAR) structure. Results: The Seasonal ARIMA - ANN model outperformed the ARIMA model in short term forecasts and the ANN model in long term forecasts. The Diebold-Mariano test indicated a significant difference between the hybrid and ANN forecasts, whereas the difference between the hybrid and ARIMA forecasts was not significant. Conclusion: The Hybrid model was able to adapt both of its constituent models’ advantages to better its performance. These results are helpful as call centres can be able to use one model which is robust enough to create accurate forecasts rather than the benchmark models.
- ItemA Systematic comparison of performance of Ridge, Lasso, Elastic net and Relaxed Elastic net when fitting high dimensional data for sales prediction(Strathmore University, 2022) Muoki, Monica MueniForecasting or prediction is one of the most crucial aspects of planning for many companies. Data-driven decisions can only be as accurate as the prediction they are based on. Some of the decisions include production planning, inventory management, and various resource allocation. Sales information is really multi-dimensional, and as a result not easy to analyse. Our motivation is to reduce the high dimension of this information, select optimal contributing variables with the aim of making accurate and reliable sales predictions. The purpose of this study is to compare the performance of four restricted regressions. This involves looking at Ridge, Lasso, Relaxed net and Elastic net regressions and assessing their performance in prediction when dealing with high dimensional data. The proposed method will involve comparison of the four mentioned regularized techniques, citing their restrictions and evaluating their prediction model performance. We will also involve data simulation to test the different models. The simulations are done under different scenarios to present the reality of a market setting. Afterwards, we will select the best model and use it to fit our real sales dataset provided by one of the leading ECMCs in Kenya. On this basis, elastic net offered best predictions based. The evaluating metrics for this models are Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and R-Squared (R2). However, the desired model based on R2 kept shifting under different scenarios to Lasso, Ridge and Elastic net. The results indicated that the regularized approaches especially elastic net are capable of dealing with non-linearity and fluctuating dynamics in manufacturing industry while predicting electrical cable sales accurately.
- ItemAnalysis of recurrent events with associated informative censoring: application to HIV data(Strathmore University, 2020-06) Ejoku, JonathanIn this study, we adapt a commonly used Cox-based model for recurrent events; the Prentice, Williams and Peterson Total -Time (PWP-TT) that has been largely used under the assumption of non-informative censoring and evaluate it under an informative censoring setting. Empirical evaluation was undertaken with the aid of the semi-parametric framework for recurrent events suggested by (Huang and Wang, 2004) where a subject speci c latent variable is used to model the association between the recurrent event and hazard of the failure time. All implementations were made in R Studio software, using the reReg package (Chiou and Huang, 2019) and the method in the reReg function set to 'cox.HW'. For validation we used HIV data from a typical HIV care setting in Kenya. Results show that the PWP-TT model generally t the data well, with a comparison to the Andersen-Gill method showing similar estimates, while the ordinary Cox model estimates were too unreliable
- ItemApplication of Hybrid seasonal ARIMA-GARCH Model in modelling and forecasting fertilizer prices in Kenya(Strathmore University, 2023) Okello, E. A.Volatility in fertilizer prices pose a huge risk to both farmers and suppliers. To manage fertilizer price volatility, a more efficient price risk management model is necessary. Stand alone models have been criticized for failing to capture the true market conditions by capturing only the unilateral information. Better outcomes have been credited to combined models, such time series models. Existing models have factored in variables such as natural gas, transport, volumes traded, crude oil prices, corn prices, ethanol, market concentration and regions. In this study, the port through which fertilizer is imported is taken into account while creating a Hybrid SARIMA-GARCH model, which is then used to anticipate pricing. Using RMSE, MAE, and MASE, the model’s predictive abilities were assessed. The findings of this study suggest that the best model for the port of Gulf is SARIMA models (1, 1, 0) (2, 1, 0)12, with an AIC = 997.53, and RMSE = 5.6015, and can efficiently capture the pricing behaviour in this port. In Yuzhny, Hybrid SARIMA (2, 1, 0) (2, 1, 0)12–GARCH (1, 1) turned out to be the best fit with AIC = 7.4389, RMSE = 7.5802, MAE=5.4797 and MASE=0.6885. The study concludes that the port through which fertilizer is imported has an effect on the price placed as each of the ports under study yielded a unique model. KEY WORDS: Nonlinear time series, Heteroscedasticity, SARIMA model, GARCH model, Hybrid SARIMA GARCH model, Ljung–Box test, Augmented Dickey Fuller test.
- ItemAssessing efficient odds ratios: an application to surgical stage prediction in cervical cancer(Strathmore University, 2020) Jesang, Jean C.Background: Cervical cancer remains the second most commonly diagnosed cancer and the third leading cause of cancer death in developing countries. Improving clinicians' knowledge and understanding of surgical staging is critical in the fight against the disease. Kenya has limited research on accurately predicting the surgical stage following surgical treatment for cervical cancer. The uptake of predictive mechanisms by gynecologists has not been common. Objective: To assess prediction by comparing the odds ratios of three popular ordinal regression models i.e. the Multinomial Logistic Regression (MLR) model, the Continuation Ratio (CR) model and Adjacent Category Logistic (ACL) model when applying cervical cancer data in surgical stage prediction. Method: We systematically compared the performance of MLR, CR and the ACL as the predictive mechanisms and evaluated the most appropriate model in the cervical cancer setting. The study considered women who visited the Oncology department at the Moi Teaching and Referral Hospital's Chandaria Cancer and Chronic Diseases Center and were diagnosed and surgically treated for cervical cancer from January 2014 to December 2018. Results and conclusion: We presented the comparison between 3 different regression models for ordinal data within the cervical cancer setting. We choose to carry out an inferential and a predictive approach. The inferential approach found that the CR model without proportional odds yielded better results when comparing the Akaike Information Criterion (AIC), log likelihood ratio and residual deviance. In addition, the key prognostic factor associated with invasive cervical cancer was the FIGO clinical stage which in particular, had a higher influence on the surgical stage 2 outcomes compared to the lesser surgical stage categories. All the 5 independent features selected for classifying the patients into surgical stages were the FIGO clinical stage and partly, the presence or absence of cancer of symptomatic vaginal discharge. However, the predictive approach found that the MLR, CR and ACL models were not statistically different and not suitable for the prediction of the surgical stage among the women surgically treated for cervical cancer.
- ItemAssessing predictive performance of supervised machine learning algorithms: an alternative model for diamond pricing(Strathmore University, 2022) Kigo, Samuel NjorogeThe world’s hardest mineral is a diamond, which is 58 times harder than any other mineral, and its beauty as a jewel has long been appreciated. The diamond is popular due to its optical property as well as other causes such as its durability, custom, fashion, and strong marketing by diamond producers. Diamond demand, on the other hand, is not directly related to such inherent characteristics, but rather to their perceived value as rare and expensive objects. Forecasting diamond pricing is challenging due to non-linearity in important features such as carat, cut, clarity table, and depth. Given this, we conducted a comparative analysis and implementation of multiple supervised machine learning models in predicting diamond price in both classification and regression approaches. We evaluated eight different supervised algorithms in our work, including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron, and showcased the best suitable model given selected evaluation metrics. The analysis in this work is based on data preprocessing, exploratory data analysis, training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, the eXtreme Gradient Boosting method was recommended for forecasting the price of a diamond specimen.
- ItemA Bayesian approach to Geo spatial analysis of HIV viral load data(Strathmore University, 2019) Kareko, Joy Hilda MukamiHIV is currently ranked among the leading causes of death in Kenya and in the world, with an estimated 1.5 Million Kenyans living with HIV and 28,000 deaths recorded annually as a result of AIDS related illnesses. In 2014, UNAIDS launched a 90-90-90 strategy the aim was to diagnose 90 per cent of all HIV- positive persons, provide antiretroviral therapy (ART) for 90 percent of those diagnosed, and achieve viral suppression for 90 per cent of those treated by 2020. This study is motivated by the need to assess the 3rd 90; viral suppression for 90 per cent of those ART treated and seeks to analyze one statistical paradigm (Bayesian) that have conventionally been used for geospatial trends. Use of Bayesian approach has been used previously to assess the prevalence and incidence of diseases however, this dissertation seeks to evaluate Bayesian Approach to spatial trends of HIV Viral Load Suppression in Kenya. We revisit the theoretical framework of the Bayesian Approach and apply real data from the Kenyan setting spanning from 2012 to 2017. Results show Bayesian Approach to be robust, in depth and entails more information when modelling spatio-trends of Viral Load suppression. Further, First Line ART regimen, HIV-TB co-infection and retention rates are significant predictors of Viral Load suppression spread.
- ItemClassification of X-rays images using Deep Convolutional Neural Network: COVID-19(Strathmore University, 2021) Bore, Laban KipchirchirThe increased amount of labeled X-ray image archives has triggered increased research work in the application of statistics, machine learning, deep learning, and computer vision across the different domains. The fresh studies on the application of deep transfer learning (60) CNN to detect and classify few COVID-19 datasets have had major success. COVID-19 dataset has been collected since the outbreak of the COVID-19 viruses in quarter four of 2019. COVID-19 virus confused the diagnosis, treatment, and care of patients because there is no cure and the virus mutates into different fatal variants. This has led to thousands of people dying, increased admission into hospital beds, ICU, and other health facilities. Hundreds of thousands of new infection cases are reported daily across the world. The overburdening of the health system by the COVID-19 virus has caused access to other health services difficult in the under-served world (89). Traditionally, medical doctors carry several tests such as full blood count tests to ascertain if the body is fighting certain pathogens, sputum tests, and chest X-rays. Doctors will examine patients' medical history, carry physical exams such as listening to the lungs with astethoscope for abnormal crackling sounds. The success of this traditional diagnosis process is dependants on the doctors' experience, skills. quality of X-ray images and the availability of patient's historical records. This is almost unattainable and unsustainable in the under-served countries in Africa. The motivation of this paper is to complement the traditional diagnosis and analysis of chest X-ray images by introducing machine classification approaches and state-of-the-art deep residual network ResNet18 (14, 35). According to WHO (58), diagnosis is a process and requires classification steps to inform research, health policies, and care of the patients. An alternative definition is a \pre-existing set of categories agreed upon by the medical profession to designate a specific condition" (43). We applied statistical learning model to separate and classify all the X-Rays images with patchy areas into one distinct class for further research, examination, analysis, and care of the patients. The observed white patchy areas in our X-Rays images was our statistical variables of interest in classifying Chest X-Rays images into COVID-19 and non-COVID-19, pg 3.2. In addition, the final model can be replicated in other non-covid datasets and extended to other related classification tasks. Deep CNN classification model(ResNet18) as a subfield of non-parametric statistics was used for classifying and predicting COVID-19 positive images. The datasets used were COVID-19 positive (184 cases) and the COVID-19 negative cases (5000) were aggregated from different sources. The COVID-19 negative cases was from 10 disease categories (Pneumonia, Pneumothorax, Lung opacity, Fracture, Atelectasis, Edema, pleural, etc). The finetuned deep CNN model (ResNet18) performed significantly with precision (87.5%), sensitivity (75%) and specificity (99.8%). Rerunning the model using larger datasets by adding noise through data augmentation demonstrated sensitivity (90%) and specificity (100%). Hence, when more dataset is fed into the neural model, the classification performance such as precision, AUC and recall improves significantly. This classification model can be used to aid radiologists or medical practitioners in chest X-ray image diagnosis and treatment (59) by categorization, diagnosis, detection, and prediction. Further extension of this research work will focus on using larger COVID-19 or non-COVID-19 datasets with more focus on systematic review around data acquisition, data certification, model development and pitfalls, and explanation construction (39).
- ItemComparison of neural networks and tree-based ensemble methods in detecting correlates of breast cancer survival(Strathmore University, 2022) Katam, Ruth JepchirchirBreast cancer is common among women impacting about 2.1 million women each year, and causing a big number of cancer-related deaths. Most times doctors have a struggle in diagnosing the stage to determine accurately and needed medication. Therefore, accurate detection of correlates of breast cancer survival is paramount. This study sought to compare the performance of Neural Networks and Tree-based Ensemble methods to predict breast cancer survival, elucidating on factors causing breast cancer based on clinical data for timely intervention. The accuracy score, recall score, precision score, Area under Receiver- Operating Characteristic Curve, and F1 score were used to evaluate the performance of each model in discerning between breast cancer survivors and non-survivors. XGboost and LSTM exhibited an outstanding performance in the classification of Breast cancer patients. However, XGboost was the most optimal model. The results depicted that age at diagnosis, pam50+ claudin low subtype her2, 3 gene classifier subtype high, profile,radiotherapy,Nottingham prognostic index,type of breast surgery breast conserving, type of breast surgery mastectomy, mutation count, lymph nodes examined positive, tumor stage, tumor size, 3 gene classifier subtype low profile, pre inferred menopausal state and Post inferred menopausal state. among others were the most important correlates of survival from breast cancer.
- ItemDetermine the breaking point of Kenya debt an application of extreme value theory(Strathmore University, 2017) Mathenge, Jacqueline WachukaThe aim of the study is to determine the breaking point of Kenyan public debt through the use of Extreme Value Theorem (EVT). EVT focuses on the tail end of distributions to be able to identify maxima and minima points. With the rising debt levels since devolution, from Kenya Shillings (KES) 500 billion in 2013 to KES 2.5 trillion in 2015, and warnings from international bodies such as International Monetary Fund (IMF) and World Bank on rising debt levels, there is need to determine sustainability of debts beyond analyst speculations. The use of the special case of EVT known as Generalized Extreme Value (GEV) application looks at a degenerate distribution factor thus ensuring the tail end of the distribution, that is, the maxima converges to the GEV despite the distribution of the data set (no assumption on the distribution of the data set). From the study the Gumbel model was determined to be the most appropriate model and with a 95% threshold, the GEV projected total debt maxima to be KES 5 trillion. This is evidence that the current debt levels of KES 2.5 trillion is still sustainable but should however be monitored.
- ItemDeveloping pediatric prognostic model using finite mixture models(Strathmore University, 2017) Ogero, Morris OndiekiBackground: World Health Organization (WHO) guidelines recommend early identification of patients who have emergency features for early medical intervention with the aim of reducing child mortality and morbidity. Prognostic models have been developed to be used in clinical setups, but their performance in external validations has been dismal. These poor performances have been attributed to suboptimal statistical methods used for derivation of these scores. Methods: The Bayesian finite mixture model was used to succinctly identify subpopulations in a population of 47,596 patients from different geographical regions. Mixed models were used to derive a final prognostic model taking into account subgroups of the population. Clinically relevant yet routinely available prognostic factors were used in model development. Results: Amongst the 23 risk factors used, the AVPU scale which measures unconsciousness was the strongest predictor of mortality with odds of (AOR=2.94, 95% CI= 2.57 - 3.36). Oedema (AOR= 2.66, 95% CI= 2.18 - 3.24), pallor (AOR=2.09, 95% CI= 1.86 - 2.36) and the presence of >= 3 severe comorbidities (AOR=2.19, 95% CI= 1.73 - 2.74) were also associated with an increased risk of death. Conclusion: Given that patient are not alike, a statistical methodology that clusters patients into homogeneous subpopulations should be used to account for the inherent variability in the medical patients. Computational methodology such as mixture models should be used to identify inherent subpopulations that underlie the population of medical patients under study. Limitation: The use of diagnostic episodes as one of predictors in the model was based on the clinician’s impression (not a laboratory test) thus the possibility of false positives could not be ruled out.
- ItemDistributions of zero-inflated models with application to HIV exposed infants(Strathmore University, 2019) Nekesa, Faith VictoryThe instances of data with excess zeros are commonly found in many disciplines, including the public health. Several models have been proposed when analyzing this kind of data. The World Health Organization (WHO) indicates that majority of the 1.8 million children who are at the present with HIV in sub-Saharan Africa got the HIV virus from their mothers probably during delivery, pregnancy or through breastfeeding, but the study shows there is a drop in the rate of infections due to interventions that have been put in place. Here we attempt to fit zero-inflated models to data in this setting. The objective is to systematically compare distributions of the various zero-inflated models with an application to HIV Exposed Infants (HEI). We revisit zero-inflated models, conducted the simulations and applied the models to HEI data. The models performance were evaluated by Akaike Information Criteria(AIC).The simulation results indicated ZAP had the lowest AIC value of 467.95 at 80% of zeros. The real data showed ZAP as the best fit for the simulation data since it had the lowest AIC value. From the simulations results of the AIC value and the real data results, it is clear that ZAP is the best fitting model.
- ItemExamining Gaussian Mixture Models using clustering algorithms(Strathmore University, 2023) Oloo, J. M.Clustering is an important data mining technique for finding homogeneous and heterogeneous groups in a data set. Identifying these groups from a sales data-set is important for estimating demand for a specific range of products. This research carried out a detailed analysis of Gaussian Mixture Models by using the expectation-maximization method to find optimal clusters on a sales data-set. The method combines expectation-maximization algorithm with the agglomerative hierarchical clustering, resulting in an effective, iterative process for estimating the model’s parameters. In order to give accurate estimates for the ideal number of clusters, the expectation-maximization approach uses the hierarchical clustering to provide an initial guess for the algorithm. The goal is to boost sales performance of products sold by estimating demand and comparing sales over a particular period. The method segmented clients into groups with shared characteristics, such that customers within each subgroup could be offered products and promotions that are likely to interest them. Therefore, this study was interested in maximizing the distance between individual clusters and also minimizing the distance between items belonging to the same cluster. The research experimented with sales data from a large liquor distribution company, examining how variables such as product, customer, sales region, and quantity sold affected overall sales volume and revenue. In order to identify deviation in product sales, the data-set was split into subsets. Also, before clustering and data pre-processing, exploratory data analysis was used to understand the features of the data. To correctly measure the performance of the clustering algorithm the study used the Bayesian Information Criterion as a goodness of fit metric. The results had two distinct clusters that represented analysis of 146 products and 223 customers from the dataset. These findings confirmed that Gaussian Mixture Models and EM algorithms are more effective at estimating the underlying key parameters and identifying subgroups of similar products and customers.
- ItemForecasting Kenya’s GDP using a hybrid neural network and ARIMA model(Strathmore University, 2020-03) Ngige, Isabel WanjiruBackground: Gross Domestic Product (GDP) is the market value of goods and services produced within a selected geographical area usually a coun- try in a selected interval in time often a year and can be measured and forecasted in di erent ways for use by governments and other market par- ticipants.Speci c users of information on GDP analysis include the United Nations0 Sustainable Development Goal assessment whose key indicator is economic growth as measured by GDP and the joint International Mon- etary Fund-World Bank methodology for conducting standardized debt- sustainability analyses in low-income countries. Objective:The main objective of this study was to assess the superiority as suggested by Literature of a Hybrid Autoregressive Integrated Moving Average(ARIMA) and feed forward Arti cial Neural Network (ANN) model over a pure ARIMA model in forecasting Kenya0s GDP. Methods: The ARIMA and the additive ANN-ARIMA Hybrid model is used to forecast absolute GDP values and the comparative forecast accuracy is tested using the RMSE and visualization plots.The Box-Jenkins method- ology is used to t the ARIMA model while the feed-forward Neural Network Autoregressive(NNAR) structure is used to model the neural network por- tion of the hybrid model .
- ItemForecasting of the inflation rates in Kenya: a comparison of ANN, ARIMA and SARIMA(Strathmore University, 2021) Kogei, Victor KipronoMonetary policies like price stability are regulated by the Central Bank of Kenya (CBK). Price stability is a key indicator of stable and predictable inflation. Accuracy and reliability in forecasting the inflation rates or predicting its trend correctly are very essential to investors, academia and policymakers. This call for the need to have models with an accurate prediction of the inflation rates to spur investment and economic growth. The use of an intelligence-based model has been found to be robust in forecasting financial and economic series like inflation rates and stock prices. This research, therefore, employs the use of the artificial neural network to forecast the inflation rates in Kenya and compared its performance with statistical models ARIMA and SARIMA. The artificial neural network models emulate the information processing capabilities of neurons of the human brain, thus making them flexible to map input and output well. A major advantage of ANNs is its ability to capture linear and non-linear data due to lack of assumptions, unlike statistical models. The inflation rates data, Gross domestic product (GDP) and exchange rates were the variables used. The variables are monthly data from January 2012 to February 2021. The prediction performances of the three models were evaluated through RMSE, MAE and MAPE. The results obtained show that artificial neural networks outperformed ARIMA and SARIMA models. The implication is that the government can adopt an artificial neural network for forecasting inflation rates in Kenya.
- ItemForecasting the term structure of interest rates in Kenya using Bayesian models post 2007-2008 financial crisis(Strathmore University, 2022) Bosire, Luycer NyanchamaDespite the growing significant advances in the modelling of the term structure of interest rates after the great recession of 2008, little attention has been paid to the problem of forecasting the term structure which has proven to be an important rate in several products and instruments offered by financial institutions. This dissertation makes use of a Dynamic Nelson-Siegel model with a Time-Varying Vector Auto- Regressive component to fit a model and forecast the h-step ahead expected yield. The model makes use of four parameters representing a decay factor, level, slope and curvature latent factors estimated with high efficiency. We propose to use our DNS-TV-VAR model to estimate our factors and demonstrate the model consistency to a range of stylized yield curve initial data. We apply the model in forecasting a term structure for short and long horizons and conclude that the forecasts appear more accurate for long horizons.
- ItemIdentifying the best method to correct for missing data, a case of HIV/TB co-infection in Kenya(Strathmore University, 2020) Mwaro, Joshua OwuoriHaving missing information is almost inevitable in research, but many researchers only report on complete cases. Here we review the missing data theory, missingness characteristics, look at the background information, importance of studying missing data, the most common ways of correcting for missing data then extend to Kenyan HIV/ TB co-infection setting. We review most of the existing methods of dealing with missing data and what other scholars have done in the missing data area. In the methodology section, we outline and give characteristics and features of the four methods for dealing with missing data (Analysis of complete cases only, Mean/Single imputation method, MLE method, and Multiple Imputation method.) which our study is focused on. We also test the four methods on the simulated data then apply the same procedure on the real Kenyan HIV/TB co-infection data. Results showed that analysis of data that was corrected for missingness using: complete cases only, weighted method, likelihood-based, and multiple imputation estimated the Kenyan HIV/TB co-infection rate to be 29%, 27%, 26%, and 21% respectively. The results indicate that MI is the best approach to correct for missing data as it does not overestimate the HIV & TB co-infection rate.
- ItemIdentifying the optimal time series model to predict Kenyan stock prices(Strathmore University, 2023) Moenga, P. K.Prior research indicates that a rise in the stock market has been associated with a correspond- ing upsurge in economic growth. The act of investing in stock prices serves to bolster a nation’s economy through the mobilization of long-term financial assets for the purpose of production, while simultaneously mitigating potential investment risks via diversification strategies. Hence, the significance of the stock market endures as government’s worldwide endeavor to achieve economic advancement as a primary objective. Investing in the stock market bears inherent risk due to the heightened levels of volatility and the intricate and capricious nature of the market. In order to make informed investment decisions, investors and market analysts must diligently analyze market behavior and formulate effective pur- chasing or selling strategies. One of the methods for comprehending the behavior of markets is by foreseeing impending values and possessing discernment with regard to the timing of investments. Investors have endeavored to devise various models that can precisely forecast the future values of stocks. This study aims to make a noteworthy contribution to the quest of forecasting stock prices for Kenyan companies by ascertaining the most optimal time series model. It employed the ARIMA and prophet model in order to ascertain the most suitable time series model for the prediction of share prices in Kenya. It has utilized the daily data of SAFARICOM PLC, Equity Group Holdings Limited (NSE: EQTY), KCB Group Limited (NSE: KCB), East African Breweries Limited (NSE: EABL) and Co-Operative Bank of Kenya Limited (NSE: COOP) for a period of five years, starting from January 2017 and ending in December 2021. The data set consisted of 1248 trading days, which were analyzed in the current investigation. The Root Mean Square Error (RMSE) was employed for model assessment in order to determine the optimal time series model for the prediction of stock prices. It discovered that the ARIMA model exhibited superior predictive performance in comparison with the Prophet model in forecasting Kenyan stock prices. The study posits that future research endeavors may benefit from augmenting sample size and encompassing multiple industries to improve the generalizability of findings.
- ItemImproving performance of hurdle models using Rare-Event Weighted Logistic Regression: application to maternal mortality data(Strathmore University, 2022) Okello, Sharon AwuorHurdle models, which are commonly used alongside zero-inflated models to analyze dispersed zero-inflated count data, employ a logit link function to predict whether an observation takes a positive count or a zero count based on a set of covariates. However, the logit model tends to be biased toward the majority zero class in cases involving rare events, and may underestimate the positive counts when their proportion is significantly smaller than that of the zero counts. This research aimed to improve the performance of hurdle models by incorporating rare-event weighted logistic regression model. Poisson and Negative Binomial (NB) Hurdle Rare Event Weighted Logistic Regression (REWLR) model estimates were developed and fit on various simulation conditions and maternal mortality data for performance evaluation using Akaike Information Criterion (AIC) and Area Under Curve (AUC). The Negative Binomial Hurdle REWLR emerged to be the best performing among all the evaluated models due to the ability to handle dispersion and adjust for class imbalance. The research findings will provide reliable estimates of the maternal mortality ratio in Nairobi without the risk of over-fitting zero counts.
- ItemA Joint modelling approach of monthly anthropometry and time to death among hospitalized severe malnourished children in Kenya(Strathmore University, 2021) Maronga, Christopher SianyoBackground: In follow up studies, interest often lies in understanding the association between biomarkers measured over time and a time-to-event outcome. For this, a two-stage separate analysis or the use of time-dependent Cox models are often used. The former approach does not account for shared features between the two processes while the latter ignores the indigeneity in the biomarker, resulting in inefficient and biased estimates. The objective of this project was to _x joint models on longitudinal anthropometry and time to death among children hospitalized with complicated SAM in four hospitals in Kenya. Methods: Data from a randomised placebo-controlled trial for 1,778 children aged 2 to 59 months admitted to hospital with complicated Severe Acute Malnutrition (SAM) but without HIV was analysed. We used Linear mixed effects models to model longitudinal anthropometry and Cox proportional hazards model to assess the effect of a priori selected baseline covariates on mortality. The two models were linked through current value and slope association to create a joint model used to study the effect of longitudinal anthropometry on risk of death. Results: The joint model results showed that a unit centimetre gain in monthly midupper arm circumference (MUAC) was associated with 46.8% reduction in hazard of death, 0.532(95% CI: 0.476-0.596), while a unit gain in standard deviation (SD) for weight-forheight WHZ) was associated with 37.1% reduction in the risk of death, 0.629(95% CI:0.579- 0.683). A unit gain in SD for monthly weight-for-age (WAZ) and height-for-age (HAZ) was associated with 21.2%, 0.788(95% CI: 0.742-0.837) and 2.5%, 0.227(95% C.I: 0.008 - 6.556) reduction in risk of mortality respectively. Conclusion: In studying the relationship between survival outcome and covariates, researchers often use baseline values of the covariates which fails to account for the interdependencies. Using joint modelling framework, we quantified the association between four longitudinal anthropometry and risk of death. Through current value and slope association, MUAC and WHZ have the strongest association with risk of death respectively hence are better metrics and can be used to screen and identify high-risk children.