From population- to patient-based prediction of in-hospital mortality in heart failure using machine learning.
Utilizing administrative data may facilitate risk prediction in heart failure inpatients. In this short report, we present different machine learning models that predict in-hospital mortality on an individual basis utilizing this widely available data source. Inpatient cases with a main discharge diagnosis of heart failure hospitalized between 1 January 2016 and 31 December 2018 in one of 86 German Helios hospitals were examined. Comorbidities were defined by ICD-10 codes from administrative data. The data set was randomly split into 75/25% portions for model development and testing. Five algorithms were evaluated: logistic regression [generalized linear models (GLMs)], random forest (RF), gradient boosting machine (GBM), single-layer neural network (NNET), and extreme gradient boosting (XGBoost). After model tuning, the receiver operating characteristics area under the curves (ROC AUCs) were calculated and compared with DeLong's test. A total of 59 074 inpatient cases (mean age 77.6 ± 11.1 years, 51.9% female, 89.4% NYHA Class III/IV) were included and in-hospital mortality was 6.2%. In the test data set, calculated ROC AUCs were 0.853 [95% confidence interval (CI) 0.842-0.863] for GLM, 0.851 (95% CI 0.840-0.862) for RF, 0.855 (95% CI 0.844-0.865) for GBM, 0.836 (95% CI 0.823-0.849) for NNET, and 0.856 (95% CI 9.846-0.867) for XGBoost. XGBoost outperformed all models except GBM. Machine learning-based processing of administrative data enables the creation of well-performing prediction models for in-hospital mortality in heart failure patients.