Multicenter Development and Prospective Validation of eCARTv5: A Gradient-Boosted Machine-Learning Early Warning Score.
Background: Early detection of clinical deterioration using machine-learning early warning scores may improve outcomes. However, most implemented scores were developed using logistic regression, only underwent retrospective validation, and were not tested in important subgroups.
Objective: The objective of our multicenter retrospective and prospective observational study was to develop and prospectively validate a gradient-boosted machine model (eCARTv5) for identifying clinical deterioration on the wards. All adult patients admitted to the inpatient medical-surgical wards at seven hospitals in three health systems for model development (2006-2022). All adult patients admitted to the inpatient medical-surgical wards and at 21 hospitals from three health systems for retrospective (2009-2023) and prospective (2023-2024) external validation. Predictor variables (demographics, vital signs, documentation, and laboratory values) were used in a gradient-boosted trees algorithm to predict ICU transfer or death in the next 24 hours. The developed model (eCARTv5) was compared with the Modified Early Warning Score (MEWS), the National Early Warning Score (NEWS), and eCARTv2 using the area under the receiver operating characteristic curve (AUROC).
Results: The development cohort included 901,491 admissions, the retrospective validation cohort included 1,769,461 admissions, and the prospective validation cohort included 205,946 admissions. In retrospective validation, eCARTv5 had the highest AUROC (0.834; 95% CI, 0.834-0.835), followed by eCARTv2 (0.775 [95% CI, 0.775-0.776]), NEWS (0.766 [95% CI, 0.766-0.767]), and MEWS (0.704 [95% CI, 0.703-0.704]). eCARTv5's performance remained high (AUROC ≥0.80) across a range of patient demographics, clinical conditions, and during prospective validation.
Conclusions: We developed eCARTv5, which performed better than eCARTv2, NEWS, and MEWS retrospectively, prospectively, and across a range of subgroups. These results served as the foundation for Food and Drug Administration clearance for its use in identifying deterioration in hospitalized ward patients.