Comparison of PLS and LASSO features selection techniques on cancer classification
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
Recent reports from the World Health Organization (WHO) highlight cancer as the leading cause of global mortality, with a significant impact on women, who frequently experience breast, lung, colorectal, thyroid, and ovarian cancers. The need for accurate diagnostic tools is paramount. This study conducts a comparative analysis of three supervised machine learning classifiers (XGBoost, Random Forest, and 1D convolutional neural networks (1D-CNN)) using feature selection methods, the least absolute shrinkage and selection operator (LASSO) and the partial least squares (PLS), to identify the most effective approach for diagnosing common women’s cancers. RNA-Seq gene expression datasets from the Genomic Data Commons Data Portal were used for breast, colon, ovarian, lung, and thyroid cancers. PLS and LASSO identified significant features, with LASSO selecting 173 genes as outlined in the anchor paper and PLS selecting 162 genes. All models achieved high accuracy (>99%) in cancer classification, with XGBoost combined with LASSO demonstrating superior performance in multiple metrics. Notable genes such as TG, COL1A1, CTSB, CLU, and MGP emerged as crucial markers for classification. The study underscores the importance of precise feature selection in the development of reliable machine learning classifiers for cancer diagnosis, advocating LASSO over PLS in conjunction with XGBoost. These findings highlight the critical role of accurate feature selection in improving cancer diagnosis precision and ultimately improving patient outcomes in oncology.
Keywords: Women’s cancer, RNA-Seq gene expression, XGBoost, Random Forest, Genomic Data Commons
Description
Full - text thesis
Keywords
Citation
Lunalo, J. O. (2025). Comparison of PLS and LASSO features selection techniques on cancer classification [Strathmore University]. https://hdl.handle.net/11071/16470