Robust statistical learning for optimal classification of imbalanced data

Juma, Samuel Wanyonyi

Robust statistical learning for optimal classification of imbalanced data

dc.contributor.author	Juma, Samuel Wanyonyi
dc.date.accessioned	2022-06-13T06:17:38Z
dc.date.available	2022-06-13T06:17:38Z
dc.date.issued	2021
dc.description	A Research Thesis Submitted to the Graduate School in partial fulfillment of the requirements for the Award of Master of Science Degree in Statistical Sciences at Strathmore University	en_US
dc.description.abstract	Neurobiological disorders such as Learning Disabilities (LD) are increasing becoming a major concern in education and health sectors, hence, precise identification of these disorders is critical. While neuropsychological assessments play an important role in diagnosis, there is limited conventional methodologies for test administration, scoring and interpretation of results. Consequently, there is frequent misclassification of children due to imprecise distinction between children with learning disabilities and those with learning difficulties. This research sought to apply statistical and Machine Learning (ML) approaches to strengthen the LD diagnostic process. This research addresses the challenges of learning from imbalanced data, a characteristic often associated with LD data due to low prevalence of the disorder. Imbalanced data poses a challenge in designing efficient ML solutions since standard classification models assumes fairly distributed classes. The study used experimental design to identify a suitable base learner, and corrective technique to tackle the challenge of imbalanced data. Statistical experiments performed were based on secondary data obtained from a Baseline Survey on Learning Disabilities conducted by Kenya Institute of Special Education in 2019. It was found that Support Vector Machine (SVM) is the best base learner for imbalanced data with the highest classification efficiency compared to other classification models. For data with high dimensionality, it was found that the classification power of Artificial Neural Network (ANN) was better than that of SVM despite the need for significantly higher computational effort. When data dimensionality is reduced, it was observed that classification power of ANN reduces significantly. SVM was also found to be a more flexible model whose classification power is least affected by changes in data dimensionality. It was found that both Adaptive Boosting (AdaBoost) and Adaptive Synthetic Sampling (ADASYN) equally perform well in tackling the imbalanced data, with AdaBoost performing slightly better, although the difference was not statistically significant. The study concludes that SVM and ANN can be used to model highly imbalanced data to achieve the highest classification accuracy with respect to the minority class. ADASYN and AdaBoost methods can be used jointly to build a more robust corrective algorithm to tackle highly imbalanced data.	en_US
dc.identifier.uri	http://hdl.handle.net/11071/12814
dc.language.iso	en	en_US
dc.publisher	Strathmore University	en_US
dc.subject	Statistical learning	en_US
dc.subject	Optimal classification	en_US
dc.subject	Imbalanced data	en_US
dc.title	Robust statistical learning for optimal classification of imbalanced data	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Robust statistical learning for optimal classification of imbalanced data.pdf
Size:: 2.67 MB
Format:: Adobe Portable Document Format
Description:: Full - text thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

MSc.SS Theses and Dissertations (2021)