MSc. DSA Theses and Dissertations

Permanent URI for this community

http://hdl.handle.net/11071/15646

Browse

Now showing 1 - 11 of 11

Utilizing Convolution Neural Networks for enhanced lung cancer classification through CT scan analysis
(Strathmore University, 2024) Korir, P. J.
Lung cancer is the major cause of cancer mortality, which poses significant challenges to accurate and timely diagnosis, especially in resource constrained regions like Kenya. The traditional method of diagnosing lung cancer through Computed Tomography (CT) scans often involves manual interpretation, leading to potential delays and inaccuracies. This research aims to harness the power of Artificial Intelligence (AI) to improve the diagnostic process. This research study developed a Convolution Neural Network (CNN) model for enhanced classification of cancer utilizing CT scan images by fine-tuning the pre-trained ResNet50 architecture. Utilizing Pytorch, a leading deep learning framework for computer vision, the model was trained on a curated dataset from the public Lung Image Database Consortium (LIDC), a medical imaging database for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis The collected CT scan image include various types of lung cancer, such as adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and normal tissue. Data pre-processing techniques such as resizing, normalization, converting and data augmentation techniques were used to ensure compatibility with the pre-trained model. The model’s performance was evaluated with a range of metrics, demonstrating an accuracy of 87.5%, precision of 80.97%, and an F1 score of 77.4%. These results indicate a promising capability for the model to accurately classify types of lung cancer, supporting its potential use in clinical settings. The pre-trained model was then integrated into a web-based application using the Flask framework, with a frontend designed with Vue.js to provide an intuitive user experience for image upload functionality. The Flask API facilitates communication between the frontend and the ResNet 50-based machine learning model. When a CT scan image is uploaded, it is sent to the Flask backend as an HTTP request. The Flask application processes these requests, extracting the image data and preparing it for analysis by interfacing with the ResNet 50 model, which then classifies the images and retrieves the results.
Predicting risky taxpayers in Kenya using machine learning
(Strathmore University, 2024) Cheboi, C. J.
Taxation is a fundamental tool for governments to raise revenue and fulfill their responsibilities to society. It is an essential component of modern governance, facilitating economic development, social welfare, and the provision of goods and services for the benefit of the public. Conversely, tax evasion poses a pervasive challenge impacting both advanced and emerging economies globally. In Kenya, addressing tax evasion is a significant hurdle, with the government estimating substantial annual revenue losses as a result leaving the government to seek debt financing for its programs. This study used machine learning models to classify taxpayers according to certain attributes and predict those who are most likely to evade. The study explored 24 such attributes. The target output variable was the payment time. The dataset was trained using six supervised machine learning algorithms including the Decision Tree, Logistic Regression, Random Forest, XGBoost, Support Vector Machines and Stacking. Among the trained models, the Random Forest classifier exhibited the optimal performance with a precision score of 90% and recall score of 86%. This suggests that the model can effectively predict the risky taxpayers to be subjected to a tax audit with likelihood of high returns. The study identified the top five crucial features influencing optimal tax evasion prediction as installment tax paid, total liabilities, credit brought forward, withholding value added tax credit and total expenses. Accordingly, adjusting these parameters within specified ranges is anticipated to result in an increased accuracy of the prediction of taxpayer classes. These results offer valuable insights for understanding determinants of tax compliance and enhancing the accuracy of predicting risky taxpayers towards optimizing resource allocation for better tax revenue mobilization outcomes. Keywords: Taxation, Tax evasion, Risky taxpayers, Tax Audit, Machine Learning
Interrupted time series and machine learning with application to effect of Influenza Vaccine
(Strathmore University, 2024) Juma, C. O.
Interrupted time series analysis is being increasingly employed to assess the effects of extensive health interventions. Autocorrelation and seasonality are best captured but are not well captured by the simple implementation of the time series model like segmented regression, which is commonly used. An Autoregressive Integrated Moving Average (ARIMA) model presents an alternative approach to address these issues. In this study, the fundamental principles of ARIMA and LSTM models are expounded upon, along with their application in evaluating interventions at a population level, such as determining the effect of influenza vaccine administration. Considerations such as determining the impact shape, model selection process, transfer functions, loss functions, selection of batch sizes and epochs training the neural networks, evaluation metrics, and interpreting results are discussed. Additionally, detailed R and Python codes are provided for result replication. The application of ARIMA and LSTM predictive modeling is demonstrated through an analysis of influenza vaccination intervention to reduce the number of medically attended respiratory illnesses among children under five years. Precisely, from November 2019 to November 2021, an influenza vaccination demonstration project. In conclusion, ARIMA modeling and LSTM serve as valuable tools for assessing the effects of large-scale interventions when traditional methods are not applicable, given their ability to consider underlying patterns, autocorrelation, seasonality, and flexibility in modeling various impacts. Comparing the MAE and RMSE error results, LSTM outperformed the ARIMA model. Key terms: Interrupted time series analysis, Autoregressive integrated moving average models, LSTM, Intervention analysis
Correlated stock identification in pairs trading using extreme gradient boosting algorithm
(Strathmore University, 2024) Muhia, C. N.
Pairs trading is a well-known market-neutral trading strategy that aims to exploit market inefficiencies by identifying and trading pairs of highly correlated stocks. This research addresses the pressing problem of accurately identifying correlated stock pairs for pairs trading strategies, recognizing the potential for reducing risk and generating profits in financial markets. While traditional statistical and deep learning methods have provided valuable insights, there exists a notable research gap in assessing the effectiveness of advanced machine learning algorithms like Extreme Gradient Boosting (XGBoost) in this context. To bridge this gap, the study meticulously compares the performance of the XGBoost algorithm with conventional techniques through quantitative analysis. Leveraging historical stock price data and machine learning methodologies, the research explores the intricacies of stock pairing accuracy and profitability. The findings reveal that the tuned XGBoost model demonstrates superior accuracy, precision, and recall in identifying profitable stock pairs, outperforming traditional statistical methods and other machine learning algorithms. Specifically, the XGBoost model achieved an accuracy of 95.50% and a precision of 95.34% in identifying profitable stock pairs. These results underscore the potential of XGBoost to enhance pairs trading strategies and optimize trading decisions in dynamic financial environments. However, while the XGBoost model showcases remarkable performance, it is not without limitations. Susceptibility to overfitting and reliance on input feature quality and quantity present challenges that need to be addressed. Nonetheless, the study provides valuable insights for investors and traders, suggesting avenues for optimizing trading strategies and maximizing profitability. Recommendations include further exploration of XGBoost's capabilities in diverse market conditions and the integration of additional data sources to enhance predictive accuracy. Moreover, the research highlights the need for continued investigation into other advanced machine learning algorithms and ensemble techniques to further improve stock pairing accuracy. Ultimately, this study contributes to advancing pairs trading strategies by providing empirical evidence of XGBoost's effectiveness, while also identifying avenues for future research and development in the field. Key Words: Pairs Trading, Correlated Stocks, Autoencoders, Self-Organizing Maps, Random Forest, Support Vector Machine, Trading Strategy, Sharpe Ratio, Maximum Drawdown, Cointegration, Backtesting, Machine Learning, XGBoost
Predicting financial inclusion and access to credit in Kenya
(Strathmore University, 2024) Tanui, C.
Financial inclusion, particularly access to credit, is a crucial aspect of economic development in Kenya. This study aims to investigate the determinants of financial inclusion and access to credit in Kenya, employing logistic regression modeling to predict financial inclusion patterns; and construct a forecast model that can support policymakers and financial organizations in boosting financial inclusion. The study analyzed several factors including demographics, technology adoption, financial services usage and barriers to assess their impact on financial inclusion and access to credit. The results revealed that the use of mobile phones and the internet as technological indicators of financial inclusion were the most effective predictors. Contrary to previous studies, gender was not found to significantly affect financial inclusion in this context. The development of a machine-learning model achieved an overall prediction accuracy of 90.9%. An interactive user dashboard was also developed using flexdashboard in R and hosted in the web, with visualizations and regression models to provide insights into the key factors driving financial access in Kenya. The results showed that demographics, technology adoption, financial services usage and barriers to financial inclusion were the most significant factors that impacted financial inclusion; however, there were no significant correlations between these factors and financial inclusion as a whole. This research study will offer insights into the causes of financial exclusion in the country and how to overcome them.
Voronoi diagrams and how they shape up offense analytics in women’s football
(Strathmore University, 2024) Mugwe, A. I.
Vilar et al. (2013) introduces a method for analyzing collective offensive and defensive behavior, finding that maintaining numerical dominance in key areas of the field is crucial for both defensive stability and offensive opportunity. The consideration of offensive tactics we try to employ is looking at spotting defensive weaknesses, expected goal improvements and exploiting the opposing team’s defense when attacking. The use of the expected goal metric is important to a team as it serves beneficial from the aspect of seeing where to improve the offense by creating opportunities that have higher expected goals, and as well help in the defense by learning the expected model of the other teams and adequately positioning the team in order to make the opponent make shots from the low expected goal regions. The expected goal metric to be used will employ the use of machine learning techniques such as logistic regression, bagging algorithms, decision trees and deep learning techniques such as Multilayer Perceptron models so as to help in the dealing with the imbalanced goals variable. The expected goals model cannot be a stand alone feature and would need the incorporation of other metrics to determine what key factors per team lead to the creation of higher goal scoring opportunities, because of this, Voronoi diagrams were used in the exploration of how different team shapes at different moments during the game lead to either more goals or chances being created dependant on the space that the team occupies.
Music recommendation system using natural language processing
(Strathmore University, 2024) Chege, C. N.
Music recommendation systems have become increasingly popular in recent years, facilitating personalized music discovery for users worldwide. This dissertation explores the application of natural language processing (NLP) and machine learning techniques in developing a music recommendation system. The study involves building a collection of music lyrics databases, analyzing the lyrics using NLP methods (such as TF-IDF and similarity/distance metrics), and integrating these findings into a recommendation model. The cosine similarity model was evaluated and recorded an accuracy of 96%, precision of 95%, recall of 96% and F1-score of 95%. Therefore, incorporating lyrics-based features in music recommendation systems can improve user experience in consuming recommendations of similar and relevant music.
Machine learning model for predictive maintenance on linear accelerators
(Strathmore University, 2024) Tonui, A. K. K.
Predicting machine failures is the next frontier in industrial machine maintenance. However, the ideal implementation of such a program will require the fitting of industrial machines with sensors that can constantly monitor a machine’s vital parameters while feeding them to a supervisory module for analysis and possible action. However, such an undertaking will require massive capital and time investments to achieve. This is where log file mining and analysis come in. By analysing the already existing machine log files of medical linear accelerators, a prediction model was developed to anticipate motor problems and notify engineers without investing the capital outlay of fitting new sensors on a machine. Mining of the log files yielded an imbalanced dataset containing 3.367% anomalies. This study tested three algorithms for their predictive power with the Random Forest classifier coming out on top with 99% precision, recall and accuracy. It was closely followed by Logistic Regression and an anomaly detector, Isolation Forest, with a precision of 59%. These strong results indicate the potential of machine learning for predicting machine breakdowns to anticipate machine failures and enable engineers to take proactive maintenance action. Keywords: TrueBeam, Predictive maintenance, log files.
Leveraging learning analytics to optimize virtual learners’ performance
(Strathmore University, 2024) Ng'eno, B. C.
Learning analytics has gained traction globally over the years with many institutions acknowledging its potential to optimize learning and the environments in which learning occurs. The study is structured around three primary objectives aiming to provide a key focus on optimization of virtual learners’ academic outcome using learning analytics approaches. Firstly, it aims to identify key indicators that reliably predict students' performance within academic settings. Secondly, it seeks to examine and compare the effectiveness of different algorithms in accurately forecasting students' performance outcomes. Lastly, the research endeavours to develop and deploy a performance prediction and early alert tool utilizing R-Shiny. In this study, the performance of Logistic Regression, Naive Bayes, K-Nearest Neighbors and Support Vector Machine in predicting learners’ performance were evaluated. Utilizing 21,216 records from students at the The Open University UK, the results indicated Logistic Regression as the best performing model with a precision rate of 90% and key features encompassed student demographic information and academic history. The findings of this study give invaluable insights to educational institutions on leveraging learning analytic practices for data-driven interventions to optimize and enhance student performance. In conclusion, this study not only provides a tangible solution of students’ performance optimization but also contributes to the growing body of knowledge on learning analytic practices that provide solutions which can be incorporated in the education sector. Keywords: Learning analytics, machine learning, student performance, R-Shiny.
Developing an early warning system for Banana Xanthomonas Wilt (BXW) in Rwanda
(Strathmore University, 2024) Owuor, C. A.
Bananas are crucial for the agricultural economy of the African Great Lakes region, including countries like Kenya, Uganda, Tanzania, Burundi, Rwanda, and parts of the Democratic Republic of Congo, with an annual production exceeding 22 million tonnes. However, banana productivity faces significant threats from pests and diseases such as the Banana Xanthomonas Wilt (BXW), caused by the bacterium Xanthomonas campestris pv. Musacearum. In this study, machine learning techniques were employed to develop an early warning system for BXW. Various classification models, including Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), and Gradient Boosting Machine (GBM), were trained and evaluated for predicting BXW occurrence. RF outperformed the other models with an accuracy of 94%, followed by GBM (89%), KNN (87%), and SVM (83%). In terms of the area under the curve (AUC), RF outperformed the other models with a score of 96%, followed by GBM (95%), KNN (94%), and SVM (90%). This highlights RF’s effectiveness in creating habitat suitability maps and establishing an early warning system for BXW. The RF model was used to develop a BXW habitat suitability map for Rwanda, aiding agricultural stakeholders in identifying high-risk areas. Furthermore, a Short Message Service (SMS)-based early warning system was implemented to provide timely alerts to farmers, thereby, enhancing BXW mitigation efforts. Additionally, a web portal for real-time BXW risk prediction and analysis was developed, providing accessible information to stakeholders for proactive management strategies. Keywords: BXW, Early Warning System, Rwanda, Remote Sensing, Machine Learning.
Use of machine learning (text recognition, natural language processing, and large language models) for hand-written answer sheet evaluation
(Strathmore University, 2024) Mutugi, B.
The realm of machine learning, encompassing text recognition, natural language processing and large language models, presents a transformative potential for the education sector, particularly in the evaluation of hand-written tests. This dissertation explored the use of these technologies in hand-written tests, acknowledging their prevalence and addressing inherent challenges encountered when evaluating the tests. The significant time required for evaluation often leads to delayed results and academic calendars, while the physical and mental strain on the evaluators, coupled with varying levels of skill and knowledge can lead to inconsistencies and inaccuracies in scoring. To address these challenges, this research explored the development of a machine learning approach capable of automatically extracting questions and student responses from images/pictures of exam papers and answer sheets. The approach then assessed student responses with the corresponding exam questions using pre-trained large language models. This research adopted the CRISP-DM (Cross-Industry Standard Process for Data Mining) framework —business understanding, data understanding, data preparation, modelling, evaluation, and deployment, to streamline the development and comparison of machine learning models. The result of was a machine learning model designed to process photos of question papers and answer sheets. It extracted text questions and answers, seamlessly facilitating the interaction between users and the technology. The textual content could then be analyzed by a pre-trained large language model, which performed the assessment and provided feedback. Enhancing the efficiency of assessments and elevating the accuracy and objectivity of feedback provided to learners, this approach promised to significantly reduce the time and effort involved in the evaluation process; thereby overcoming the limitations of current practices in hand-written test evaluation.

Browse

Browsing MSc. DSA Theses and Dissertations by Issue Date

Results Per Page

Sort Options