AI-based classification of three common malignant tumors in neuro-oncology: A multi-institutional comparison of machine learning and deep learning methods.
Objective: To determine if machine learning (ML) or deep learning (DL) pipelines perform better in AI-based three-class classification of glioblastoma (GBM), intracranial metastatic disease (IMD) and primary CNS lymphoma (PCNSL).
Methods: Retrospective analysis included 502 cases for training (208 GBM, 67 PCNSL and 227 IMD), with external validation on 86 cases (27:27:32). Multiparametric MRI images (T1W, T2W, FLAIR, DWI and T1-CE) were co-registered, resampled, denoised and intensity normalized, followed by semiautomatic 3D segmentation of the enhancing tumor (ET) and peritumoral region (PTR). Model performance was assessed using several ML pipelines and 3D-convolutional neural networks (3D-CNN) using sequence specific masks, as well as combination of masks. All pipelines were trained and evaluated with 5-fold nested cross-validation on internal data followed by external validation using multi-class AUC.
Results: Two ML models achieved similar performance on test set, one using T2-ET and T2-PTR masks (AUC: 0.885, 95% CI: [0.816, 0.935] and another using T1-CE-ET and FLAIR-PTR mask (AUC: 0.878, CI: [0.804, 0.930]). The best performing DL models achieved an AUC of 0.854, (CI [0.774, 0.914]) on external data using T1-CE-ET and T2-PTR masks, followed by model derived from T1-CE-ET, ADC-ET and FLAIR-PTR masks (AUC: 0.851, CI [0.772, 0.909]).
Conclusions: Both ML and DL derived pipelines achieved similar performance. T1-CE mask was used in three of the top four overall models. Additionally, all four models had some mask derived from PTR, either T2WI or FLAIR.