SU+ Digital Repository
SU+ is an online repository for the preservation and promotion of assorted digital content at Strathmore University
Off-Campus Access to restriced resources (including the ExamsBank) now requires registration using an @strathmore.edu email address
Authentication is NOT required for On-Campus Access to content

Photo by @Strathmore University
Communities in DSpace
Select a community to browse its collections.
Now showing 1 - 5 of 7
Conferences / Workshops / Seminars + Documents and Proceedings of Conferences, Seminars, Workshops (and more) held at Strathmore UniversityDigital Archives Assorted collections of resources covering various subject themes contributed by Faculty and Library StaffReports / Policies + Public reports and policy documentsResearch / Researchers / Publications Researcher Profiles / Conference presentations / Published research articles / Faculty and Corporate research outputsStrathmore Heritage Collection A digital chronicle of the History of the University presented through a mix of pictures, videos and digitized publications
Recent Submissions
Item
Interrupted time series and machine learning with application to effect of Influenza Vaccine
(Strathmore University, 2024) Juma, C. O.
Interrupted time series analysis is being increasingly employed to assess the effects of extensive health interventions. Autocorrelation and seasonality are best captured but are not well captured by the simple implementation of the time series model like segmented regression, which is commonly used. An Autoregressive Integrated Moving Average (ARIMA) model presents an alternative approach to address these issues. In this study, the fundamental principles of ARIMA and LSTM models are expounded upon, along with their application in evaluating interventions at a population level, such as determining the effect of influenza vaccine administration. Considerations such as determining the impact shape, model selection process, transfer functions, loss functions, selection of batch sizes and epochs training the neural networks, evaluation metrics, and interpreting results are discussed. Additionally, detailed R and Python codes are provided for result replication. The application of ARIMA and LSTM predictive modeling is demonstrated through an analysis of influenza vaccination intervention to reduce the number of medically attended respiratory illnesses among children under five years. Precisely, from November 2019 to November 2021, an influenza vaccination demonstration project. In conclusion, ARIMA modeling and LSTM serve as valuable tools for assessing the effects of large-scale interventions when traditional methods are not applicable, given their ability to consider underlying patterns, autocorrelation, seasonality, and flexibility in modeling various impacts. Comparing the MAE and RMSE error results, LSTM outperformed the ARIMA model.
Key terms: Interrupted time series analysis, Autoregressive integrated moving average models, LSTM, Intervention analysis
Item
Predicting risky taxpayers in Kenya using machine learning
(Strathmore University, 2024) Cheboi, C. J.
Taxation is a fundamental tool for governments to raise revenue and fulfill their responsibilities to society. It is an essential component of modern governance, facilitating economic development, social welfare, and the provision of goods and services for the benefit of the public. Conversely, tax evasion poses a pervasive challenge impacting both advanced and emerging economies globally. In Kenya, addressing tax evasion is a significant hurdle, with the government estimating substantial annual revenue losses as a result leaving the government to seek debt financing for its programs. This study used machine learning models to classify taxpayers according to certain attributes and predict those who are most likely to evade. The study explored 24 such attributes. The target output variable was the payment time. The dataset was trained using six supervised machine learning algorithms including the Decision Tree, Logistic Regression, Random Forest, XGBoost, Support Vector Machines and Stacking. Among the trained models, the Random Forest classifier exhibited the optimal performance with a precision score of 90% and recall score of 86%. This suggests that the model can effectively predict the risky taxpayers to be subjected to a tax audit with likelihood of high returns. The study identified the top five crucial features influencing optimal tax evasion prediction as installment tax paid, total liabilities, credit brought forward, withholding value added tax credit and total expenses. Accordingly, adjusting these parameters within specified ranges is anticipated to result in an increased accuracy of the prediction of taxpayer classes. These results offer valuable insights for understanding determinants of tax compliance and enhancing the accuracy of predicting risky taxpayers towards optimizing resource allocation for better tax revenue mobilization outcomes.
Keywords: Taxation, Tax evasion, Risky taxpayers, Tax Audit, Machine Learning
Item
Correlated stock identification in pairs trading using extreme gradient boosting algorithm
(Strathmore University, 2024) Muhia, C. N.
Pairs trading is a well-known market-neutral trading strategy that aims to exploit market inefficiencies by identifying and trading pairs of highly correlated stocks. This research addresses the pressing problem of accurately identifying correlated stock pairs for pairs trading strategies, recognizing the potential for reducing risk and generating profits in financial markets. While traditional statistical and deep learning methods have provided valuable insights, there exists a notable research gap in assessing the effectiveness of advanced machine learning algorithms like Extreme Gradient Boosting (XGBoost) in this context. To bridge this gap, the study meticulously compares the performance of the XGBoost algorithm with conventional techniques through quantitative analysis. Leveraging historical stock price data and machine learning methodologies, the research explores the intricacies of stock pairing accuracy and profitability. The findings reveal that the tuned XGBoost model demonstrates superior accuracy, precision, and recall in identifying profitable stock pairs, outperforming traditional statistical methods and other machine learning algorithms. Specifically, the XGBoost model achieved an accuracy of 95.50% and a precision of 95.34% in identifying profitable stock pairs. These results underscore the potential of XGBoost to enhance pairs trading strategies and optimize trading decisions in dynamic financial environments. However, while the XGBoost model showcases remarkable performance, it is not without limitations. Susceptibility to overfitting and reliance on input feature quality and quantity present challenges that need to be addressed. Nonetheless, the study provides valuable insights for investors and traders, suggesting avenues for optimizing trading strategies and maximizing profitability. Recommendations include further exploration of XGBoost's capabilities in diverse market conditions and the integration of additional data sources to enhance predictive accuracy. Moreover, the research highlights the need for continued investigation into other advanced machine learning algorithms and ensemble techniques to further improve stock pairing accuracy. Ultimately, this study contributes to advancing pairs trading strategies by providing empirical evidence of XGBoost's effectiveness, while also identifying avenues for future research and development in the field.
Key Words: Pairs Trading, Correlated Stocks, Autoencoders, Self-Organizing Maps, Random Forest, Support Vector Machine, Trading Strategy, Sharpe Ratio, Maximum Drawdown, Cointegration, Backtesting, Machine Learning, XGBoost
Item
Predicting financial inclusion and access to credit in Kenya
(Strathmore University, 2024) Tanui, C.
Financial inclusion, particularly access to credit, is a crucial aspect of economic development in Kenya. This study aims to investigate the determinants of financial inclusion and access to credit in Kenya, employing logistic regression modeling to predict financial inclusion patterns; and construct a forecast model that can support policymakers and financial organizations in boosting financial inclusion. The study analyzed several factors including demographics, technology adoption, financial services usage and barriers to assess their impact on financial inclusion and access to credit. The results revealed that the use of mobile phones and the internet as technological indicators of financial inclusion were the most effective predictors. Contrary to previous studies, gender was not found to significantly affect financial inclusion in this context. The development of a machine-learning model achieved an overall prediction accuracy of 90.9%. An interactive user dashboard was also developed using flexdashboard in R and hosted in the web, with visualizations and regression models to provide insights into the key factors driving financial access in Kenya. The results showed that demographics, technology adoption, financial services usage and barriers to financial inclusion were the most significant factors that impacted financial inclusion; however, there were no significant correlations between these factors and financial inclusion as a whole. This research study will offer insights into the causes of financial exclusion in the country and how to overcome them.
Item
A Statistical analysis of the log returns of cryptocurrencies
(Strathmore University, 2024) Ndegwa, K. I.
There has been an increase in interest and demand for cryptocurrencies and thus understanding their statistical properties is important for it implies their risk. Understanding the risk involved in investing in the cryptocurrencies allows one to evaluate the same risk against their own risk tolerance and thus determine whether it is worthwhile to venture into cryptocurrencies and if so, the optimal weight of the investment in the portfolio. This study seeks to find the statistical distribution from a family of fat tailed distributions that best explains the log returns of cryptocurrencies. lt was conducted in Nairobi between May 2021 and February 2022. The data used was obtained from Yahoo Finance. The results suggested that the Generalized Hyperbolic Distribution gives the best fit for the large cryptocurrencies ranked by market capitalization.