The concise machine learning prediction models for suicide attempt in China: Based on demographic and social factors.
Background: Recently, the machine learning (ML) methods have been recommended to predict suicide attempts (SA). However, there is little literature reported the prediction models based on multiple machine learning methods of Chinese people and previous models always more using complex variables and not concise enough, which limited the application and extrapolation. The aim was to explore whether ML approaches can improve the prediction of suicide attempt and establish more concise model.
Methods: The dataset came from a case-control extensive survey in China. The demographic variables and GSS Suicide Attitude Scale and Beck Scale for Suicide Ideation scale were employed to collect data by face to face interview. And RF, MLR, XGBoost, AdaBoost and LightGBM methods were recruited to establish the model and evaluation indices were used to optimize the optimal prediction model. Software package of R 4.2.1 were applied to operate ML methods.
Results: The AUC of three ML methods were all larger than 0.75, which indicate the prediction efficiencies of three models are well. LightGBM has the highest AUC (0.9199) in training dataset, RF has the highest AUC (0.8136) in test dataset. Overall speaking, RF has the highest AUC (0.8638). RF and XGBoost has the highest sensitivity in training and test dataset (80.00 %, 84.83 %) respectively. LightGBM and Logistic has the highest specificity in training and test dataset (89.00 %, 82.14 %) respectively. LightGBM and RF has the highest accuracy in training and test dataset (83.76 %, 74.75 %) respectively. Overall,.RF model has the optimal accuracy (79.09 %) followed by LightGBM (77.18 %). LightGBM has the highest positive predictive value.
Conclusions: Based on comparison, each ML algorithm performed equally well in distinguishing between SA case and a non-SA case. Generally, RF and LightGBM prediction model performed best in current study. When choosing an algorithm, different research aim or dataset, might lead to choice the more prioritized algorithm. This study suggests that we should use the optimized strategy, for instance the combined model strategy or joint model strategy, ensemble model, to improve accuracy and detection yield. Further research and validation studies are required in those aspects.