SU+ Digital Repository
SU+ is an online repository for the preservation and promotion of assorted digital content at Strathmore University
Off-Campus Access to restriced resources (including the ExamsBank) now requires registration using an @strathmore.edu email address
Authentication is NOT required for On-Campus Access to content

Communities in DSpace
Select a community to browse its collections.
- Documents and Proceedings of Conferences, Seminars, Workshops (and more) held at Strathmore University
- Assorted collections of resources covering various subject themes contributed by Faculty and Library Staff
- Public reports and policy documents
- Researcher Profiles / Conference presentations / Published research articles / Faculty and Corporate research outputs
- A digital chronicle of the History of the University presented through a mix of pictures, videos and digitized publications
Recent Submissions
Item type:Item, Statistical and machine learning approaches to assessing foreign aid effectiveness in Kenya: an ARDL framework(Strathmore University, 2025) Rutto, N. J.This study investigates the impact of foreign aid and other macroeconomic factors on household consumption in Kenya, using household consumption as a proxy for poverty. It adopts a hybrid methodological approach, combining traditional econometric modelling with modern machine modelling techniques to balance causal inference with predictive accuracy. The analysis is anchored in the Autoregressive Distributed Lag (ARDL) framework, which is well-suited to small samples and mixed integration orders. After establishing the presence of cointegration among variables, the model is reparameterised into an Error Correction Model (ECM) to distinguish between short-run and long-run effects. Diagnostic tests confirm the model’s robustness. A Granger causality test reveals no temporal precedence from foreign aid to household consumption, while GDP per capita consistently emerges as a significant long-run driver. To complement the explanatory power of ARDL, three machine learning models, LASSO regression, Random Forest, and XGBoost, are implemented to assess their ability to predict changes in household consumption. The LASSO model demonstrates the best performance across all evaluation metrics (MAE, RMSE, R2), outperforming traditional ARDL and more complex ML models. Feature importance analyses using permutation importance and SHAP values reinforce the dominance of GDP per capita and lagged effects of foreign aid as key predictors. Findings indicate that while econometric methods offer nuanced insight into short- and long-term dynamics, machine learning provides superior predictive power. The study underscores the potential of a hybrid modelling approach in low-frequency macroeconomic contexts, where data constraints limit the application of purely data-hungry methods. Ultimately, the results contribute to the discussion on how aid and macroeconomic variables influence poverty outcomes in developing economies.Item type:Item, Application of an integrated Bayesian Network-Artificial Neural Network model in prediction of brand preference(Strathmore University, 2025) Ombaka, R. A.The decision-making processes surrounding infant formula selection present significant challenges for public health interventions and market strategists, necessitating robust and interpretable predictive models. This study applied and comparatively evaluated standalone Bayesian Network (BN) and Artificial Neural Network (ANN) models, alongside an integrated BN-ANN architecture, to classify infant formula choices within a Djibouti context. Employing a sequential integration strategy, where BN-derived probabilistic inferences informed the ANN’s feature set, the research analyzed model performance, feature importance, and interpretability. While the standalone BN model offered valuable probabilistic insights into conditional dependencies (Accuracy: ∼0.8500), the standalone ANN demonstrated significantly superior predictive power (Accuracy: 0.9495). Crucially, the integrated BNANN model achieved the highest predictive accuracy (0.9524) and Kappa score (0.9276), indicating a consistent, albeit marginal gain. Feature importance analysis revealed the taste, brand innovation, and completeness in range of baby formula products as the most dominant predictors. Unexpectedly, BN- probabilities contributed minimally as direct ANN features, a phenomenon potentially attributed to information loss from the necessary discretization of continuous variables for the BN, or insufficient variability in the generated probabilities. The study also noted near-perfect classification for one specific formula class, primarily due to highly separable intrinsic features. This research underscores the significant potential of hybrid artificial intelligence for complex multi-class classification, furnishing actionable insights for stakeholders in the infant nutrition sector in Djibouti. Furthermore, it contributes methodologically to the advancements in integrating probabilistic models with deep learning. Keywords: Bayesian Network; Artificial Neural Network; Hybrid Models; Infant Formula Choice; Predictive Modeling; Feature Importance; Consumer Behavior.Item type:Item, Modelling child survival in malaria-endemic regions of Kenya using Bayesian Generalized Log Logistic models(Strathmore University, 2025) Masyuko, B. N.Child mortality remains a major public health challenge in malaria-endemic regions, particularly in sub-Saharan Africa, where children under five face disproportionately high risks. In Kenya, the burden is amplified by factors such as limited healthcare access, socioeconomic disparities, and insufficient malaria prevention. This study develops and applies a Bayesian Generalized Log-Logistic (GLL) survival model to examine how maternal health practices, regional and household-level socioeconomic conditions influence child survival in malaria endemic regions. Using nationally representative data from the 2020 Kenya Demographic and Health Survey (KDHS) and Malaria Indicator Survey (MIS), the model captures a wide range of hazard shapes, including both increasing and decreasing risk patterns. To strengthen the model’s theoretical foundation, its asymptotic properties were derived to ensure consistency and efficiency in parameter estimation under large-sample conditions. The Bayesian framework further enables robust inference by incorporating prior information and quantifying uncertainty. Posterior predictive checks demonstrated good model fit, confirming the model’s capacity to reflect the observed survival dynamics. Key predictors of child survival included antenatal care utilization, household wealth, regional malaria endemicity, and malaria prevention behaviors. The study concludes that the Bayesian GLL model is a robust and flexible tool for understanding child mortality risk and can inform the design of targeted public health interventions in high-burden settings like Kenya.Item type:Item, Regularized Vector Autoregressive Model for time series data with multiple covariates(Strathmore University, 2025) Waweru, E. N.This study develops a Regularized Vector Autoregressive (VAR) model to address challenges like high dimensionality, multicollinearity, and overfitting in time series forecasting with external covariates. Traditional VAR models often struggle with scalability and stability in high-dimensional contexts. By incorporating Ridge, Lasso, and Elastic Net regularization techniques, the model enhances forecasting accuracy and model interpretability. Using a real-world dataset of Walmart sales, including weekly sales alongside economic and environmental covariates, the methodology applies preprocessing, regularized model formulation, and cross-validation for parameter tuning. Performance is evaluated using metrics like Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE), comparing traditional and regularized VAR models. The findings demonstrate the utility of regularized VAR models in handling complex time series data influenced by external covariates, with implications for broader applications in fields such as finance, healthcare, and environmental science. KEYWORDS: Regularized VAR model, Time Series Forecasting, Multiple Covariates, High Dimensionality, Overfitting.Item type:Item, Suspicious transaction prediction in Kenyan digital payments: a machine learning comparative study with imbalanced data(Strathmore University, 2025) Swaleh, I. J. A.The surge in digital payments in Kenya has heightened financial crime risks, including money laundering and terrorist financing. Despite regulatory mandates, Suspicious Transaction Reports (STRs) from Payment Service Providers (PSPs) remain below expectations. Traditional rule-based systems often fail to detect such activities, driving interest in machine learning (ML) methods like Random Forest, k-Nearest Neighbours, and Support Vector Machines. However, comparative research on these models, especially in handling severe class imbalance in Kenyan financial datasets, remains limited. This study therefore evaluated the four ML algorithms ( Random Forest, Support Vector Machine, k-Nearest Neighbours and Logistic Regression) for detecting suspicious transactions. To address class imbalance, the SMOTE-ENN re-sampling technique was applied. Factor Analysis for Mixed Data (FAMD) was used for dimensionality reduction, and model performance was assessed using F1-score and Matthews Correlation Coefficient (MCC). Random Forest outperformed other models post-re-sampling (MCC 99.93%, F1-score 99.94%). Logistic Regression showed the greatest sensitivity to class imbalance, with MCC improving from 62.87% to 97.47%. kNN and SVM also recorded significant gains. Key predictors included Business Age, Score Rank, and Product Type. The findings underscored the importance of using MCC and F1-score over accuracy when evaluating models on imbalanced datasets. They also supported the adoption of hybrid re-sampling techniques , specifically SMOTE-ENN , to enhance model performance, and highlight Random Forest as a particularly effective algorithm for fraud detection. Future research should explore advanced models such as XGBoost and leverage more diverse datasets to better capture evolving fraud patterns. Keywords: suspicious transaction reporting; digital payments; machine learning; class imbalance; SMOTE-ENN; fraud detection; random forest; F 1-score; MCC.