Comparison of PLS and LASSO features selection techniques on cancer classification

dc.contributor.authorLunalo, J. O.
dc.date.accessioned2026-04-25T13:04:31Z
dc.date.issued2025
dc.descriptionFull - text thesis
dc.description.abstractRecent reports from the World Health Organization (WHO) highlight cancer as the leading cause of global mortality, with a significant impact on women, who frequently experience breast, lung, colorectal, thyroid, and ovarian cancers. The need for accurate diagnostic tools is paramount. This study conducts a comparative analysis of three supervised machine learning classifiers (XGBoost, Random Forest, and 1D convolutional neural networks (1D-CNN)) using feature selection methods, the least absolute shrinkage and selection operator (LASSO) and the partial least squares (PLS), to identify the most effective approach for diagnosing common women’s cancers. RNA-Seq gene expression datasets from the Genomic Data Commons Data Portal were used for breast, colon, ovarian, lung, and thyroid cancers. PLS and LASSO identified significant features, with LASSO selecting 173 genes as outlined in the anchor paper and PLS selecting 162 genes. All models achieved high accuracy (>99%) in cancer classification, with XGBoost combined with LASSO demonstrating superior performance in multiple metrics. Notable genes such as TG, COL1A1, CTSB, CLU, and MGP emerged as crucial markers for classification. The study underscores the importance of precise feature selection in the development of reliable machine learning classifiers for cancer diagnosis, advocating LASSO over PLS in conjunction with XGBoost. These findings highlight the critical role of accurate feature selection in improving cancer diagnosis precision and ultimately improving patient outcomes in oncology. Keywords: Women’s cancer, RNA-Seq gene expression, XGBoost, Random Forest, Genomic Data Commons
dc.identifier.citationLunalo, J. O. (2025). Comparison of PLS and LASSO features selection techniques on cancer classification [Strathmore University]. https://hdl.handle.net/11071/16470
dc.identifier.urihttps://hdl.handle.net/11071/16470
dc.language.isoen
dc.publisherStrathmore University
dc.titleComparison of PLS and LASSO features selection techniques on cancer classification
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Comparison of PLS and LASSO features selection techniques on cancer classification.pdf
Size:
9.95 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: