Comparison of PLS and LASSO features selection techniques on cancer classification

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Strathmore University

Abstract

Recent reports from the World Health Organization (WHO) highlight cancer as the leading cause of global mortality, with a significant impact on women, who frequently experience breast, lung, colorectal, thyroid, and ovarian cancers. The need for accurate diagnostic tools is paramount. This study conducts a comparative analysis of three supervised machine learning classifiers (XGBoost, Random Forest, and 1D convolutional neural networks (1D-CNN)) using feature selection methods, the least absolute shrinkage and selection operator (LASSO) and the partial least squares (PLS), to identify the most effective approach for diagnosing common women’s cancers. RNA-Seq gene expression datasets from the Genomic Data Commons Data Portal were used for breast, colon, ovarian, lung, and thyroid cancers. PLS and LASSO identified significant features, with LASSO selecting 173 genes as outlined in the anchor paper and PLS selecting 162 genes. All models achieved high accuracy (>99%) in cancer classification, with XGBoost combined with LASSO demonstrating superior performance in multiple metrics. Notable genes such as TG, COL1A1, CTSB, CLU, and MGP emerged as crucial markers for classification. The study underscores the importance of precise feature selection in the development of reliable machine learning classifiers for cancer diagnosis, advocating LASSO over PLS in conjunction with XGBoost. These findings highlight the critical role of accurate feature selection in improving cancer diagnosis precision and ultimately improving patient outcomes in oncology. Keywords: Women’s cancer, RNA-Seq gene expression, XGBoost, Random Forest, Genomic Data Commons

Description

Full - text thesis

Keywords

Citation

Lunalo, J. O. (2025). Comparison of PLS and LASSO features selection techniques on cancer classification [Strathmore University]. https://hdl.handle.net/11071/16470

Endorsement

Review

Supplemented By

Referenced By