Random splicing assisted deep learning for breast cancer cell line classification via Raman spectroscopy.

Journal: Computational And Structural Biotechnology Journal
Published:
Abstract

Raman spectroscopy extracts rich biochemical information on a single cell, demonstrating significant potential for precise cancer identification. While machine learning enhances spectral analysis efficiency, conventional models remain constrained by data volume. Here, we developed Random Splicing-Convolutional Neural Network (RS-CNN), a deep learning framework that addresses data scarcity through spectral concatenation. By randomly splicing Raman spectra from the same cell line, RS-CNN enhances distinctive spectral features while simultaneously expanding dataset size and improving signal quality. Validation across six breast cancer cell lines demonstrated RS-CNN's superiority over five benchmark models (SVM, LDA, PCA-SVM, PCA-LDA, CNN). With 450 spectra per cell line, RS-CNN achieved 98.63 % classification accuracy compared to conventional models' accuracies of around 85 %. Under data-limited conditions (100 spectra/line), RS-CNN maintained 91.47 % accuracy, outperforming CNN's 70.83 %. The RS-CNN's generalizability was further validated by an independently acquired dataset, achieving at least 94 % classification accuracy. SHAP analysis suggested the spectral region around 980 cm⁻¹ was significant for cancer diagnosis, while the 1158-1160 cm⁻¹and 1603-1607 cm⁻¹ regions were particularly valuable for distinguishing between cancer subtypes. These findings establish RS-CNN as a robust analytical model for clinical Raman diagnostics, particularly valuable in applications requiring high accuracy with limited samples.

Authors
Yiheng Liu, Junfeng Liu, Jiayi Wan, Hongke Hao, Guangxing Liu, Xia Huang
Relevant Conditions

Breast Cancer