MSc.SS Theses and Dissertations (2022)

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 10
  • Item
    Statistical learning for class imbalanced data: a case study of Malaria indicator survey data
    (Strathmore University, 2022) Ongera,Maangi Daniel
    Class imbalanced problems are predominant in real-life applications. In most cases, the minority class is the most important. Standard statistical learning algorithms tend to produce poor results for the minority class and very good results for the majority class. One of the widely used mechanism to address this problem is by re-sampling the training data. The objective of this study is to examine the performance of statistical learning algorithms by using different re-sampling approaches for handling class imbalance. Methods Two classical and ensemble statistical learning techniques were trained on an imbalanced Malaria Indicator Survey data set while handling the majority-minority problem through re-sampling. These included: Logistic regression, support vector machines, random forest, and extreme gradient boosting. The algorithms were trained without handling class imbalance first. Secondly, the algorithms were trained using six re-sampling procedures to handle class imbalance: random under-sampling, random over-sampling, Synthetic Minority Oversampling technique (SMOTE), Random Over Sampling Examples (ROSE) techniques and Adaptive Synthetic Sampling Approach (ADASYN). We further investigated whether combining randomly under-sampled and over-sampled data can result in improved performance. Eighty percent of the data was used for model training using 5 fold cross validation. Results All methods that were considered for handling class imbalance had strengths and weaknesses depending on the performance metric. For instance, random under-sampling (RU) resulted in models with higher sensitivity than random over-sampling (RO). To get a trade-off between sensitivity and specificity, these two methods can be combined (RURO). This approach resulted in 99.5% sensitivity, 88.1 % specificity, 89.6 % precision, 94.3 % F1 score and a 93.9 % accuracy on the test set using the Extreme Gradient Boosting machine.
  • Item
    Temporal-difference comparison of learning methods for stock market prediction
    (Strathmore University, 2022) Maina, Stephen Gakuo
    Background: a stock/securities exchange is considered to be among the primary indicators’ of a country’s economic strength and development. Stock market prices are volatile in nature and are affected by factors like inflation, economic growth, etc. Prices depend heavily on demand and supply dynamics. Stock market price determination using ANNs has gained a lot of traction lately due to the obvious advantages this would represent to traders. Most methods in use today have largely been based on the feed forward algorithms, however, evolutionary techniques remain largely unexplored for this process despite their obvious robustness. Method: Using data from the Nairobi Securities Exchange, and specifically the NSE20 share index, the project will seek to apply and compare traditional ANN techniques for stock market prediction against the relatively new evolution algorithms. The Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE) and a confusion matrix will be calculated for performance evaluation. Results: the empirical results showed that the proposed evolutionary techniques out performed classic artificial neural networks methods-feed forward backpropagation.
  • Item
    Assessing predictive performance of supervised machine learning algorithms: an alternative model for diamond pricing
    (Strathmore University, 2022) Kigo, Samuel Njoroge
    The world’s hardest mineral is a diamond, which is 58 times harder than any other mineral, and its beauty as a jewel has long been appreciated. The diamond is popular due to its optical property as well as other causes such as its durability, custom, fashion, and strong marketing by diamond producers. Diamond demand, on the other hand, is not directly related to such inherent characteristics, but rather to their perceived value as rare and expensive objects. Forecasting diamond pricing is challenging due to non-linearity in important features such as carat, cut, clarity table, and depth. Given this, we conducted a comparative analysis and implementation of multiple supervised machine learning models in predicting diamond price in both classification and regression approaches. We evaluated eight different supervised algorithms in our work, including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron, and showcased the best suitable model given selected evaluation metrics. The analysis in this work is based on data preprocessing, exploratory data analysis, training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, the eXtreme Gradient Boosting method was recommended for forecasting the price of a diamond specimen.
  • Item
    Machine learning based prediction of life expectancy
    (Strathmore University, 2022) Lipesa, Brian Aholi
    The social and financial systems of many nations throughout the world are significantly impacted by life expectancy (LE) models. Numerous studies have pointed out the crucial effects that life expectancy projections will have on societal issues and the administration of the global healthcare system. These approaches offer a variety of strategies to enhance society-related advanced care planning and healthcare. Over time, research has proven that the vast majority of the existing factors were insufficient to forecast the lifespan of the general population. An understanding of the chosen sampling population’s death rate served as the foundation for earlier models. Researchers have asserted that despite improvements in forecasting approaches and meticulous work in the past, there are still several elements that must be taken into account to determine life expectancy rates in addition to death rates. As a result, life expectancy research now includes a broader focus on issues related to education, health, the economy, and social welfare. In this study, the author developed a model for estimating life expectancy rates taking into consideration health, socioeconomic, and behavioral characteristics by using the eXtreme Gradient Boosting (XGBoost) algorithm to data from 193 UN member states. The effectiveness of the model’s prediction was compared to that of the Random Forest (RF) and Artificial Neural Network (ANN) regressors utilized in earlier research. XGBoost attained an MAE and an RMSE of 1.554 and 2.402, respectively. It outperformed the RF and ANN models that achieved MAE and RMSE values of 7.938 and 11.304, and 3.86 and 5.002, respectively. The overall results of this study support XGBoost as a reliable and efficient model for estimating life expectancy.
  • Item
    Forecasting the term structure of interest rates in Kenya using Bayesian models post 2007-2008 financial crisis
    (Strathmore University, 2022) Bosire, Luycer Nyanchama
    Despite the growing significant advances in the modelling of the term structure of interest rates after the great recession of 2008, little attention has been paid to the problem of forecasting the term structure which has proven to be an important rate in several products and instruments offered by financial institutions. This dissertation makes use of a Dynamic Nelson-Siegel model with a Time-Varying Vector Auto- Regressive component to fit a model and forecast the h-step ahead expected yield. The model makes use of four parameters representing a decay factor, level, slope and curvature latent factors estimated with high efficiency. We propose to use our DNS-TV-VAR model to estimate our factors and demonstrate the model consistency to a range of stylized yield curve initial data. We apply the model in forecasting a term structure for short and long horizons and conclude that the forecasts appear more accurate for long horizons.