A Machine Learning Model for Predicting Breast Cancer Recurrence and Supporting Personalized Treatment Decisions Through Comprehensive Feature Selection and Explainable Ensemble Learning.

Journal: Cancer Management And Research
Published:
Abstract

This study investigates the efficiency of a machine learning model integrating least absolute shrinkage and selection operator (LASSO) feature selection with ensemble learning in predicting recurrence risk and supporting personalized treatment decisions in breast cancer patients. Clinical data from 1,131 breast cancer patients (1,056 nonrecurrent and 75 recurrent) were collected from Kaohsiung Medical University Hospital's electronic health record system. After preprocessing and standardization, LASSO was applied for feature selection. An ensemble learning model was developed based on multiple machine learning algorithms, with SHAP (Shapley additive explanations) used for interpretability. The ensemble model achieved an AUC of 0.817, outperforming the best single model (AUC 0.711), demonstrating improved predictive accuracy and stability. LASSO identified six key predictors: regional lymph node positivity, ER status, Ki-67, lymphovascular invasion, tumor size, and age at diagnosis. SHAP analysis enhanced transparency by quantifying the contribution of each feature to recurrence risk, improving clinical understanding. This LASSO-enhanced ensemble model significantly improves the accuracy and interpretability of breast cancer recurrence prediction. By identifying individualized recurrence risks through SHAP analysis, the model supports more precise, data-driven clinical decision-making. These findings demonstrate its potential as a clinical decision support tool for guiding personalized treatment strategies, contributing to more effective breast cancer management.

Authors
Tsair-fwu Lee, Jun-ping Shiau, Chia-hui Chen, Wen-ping Yun, Cheng-shie Wuu, Yu-jie Huang, Shyh-an Yeh, Hui-chun Chen, Pei-ju Chao
Relevant Conditions

Breast Cancer