Prediction of acute and chronic kidney diseases during the post-covid-19 pandemic with machine learning models: utilizing national electronic health records in the US.

Journal: EBioMedicine

Published: October 17, 2024

Abstract

Background: COVID-19 has been linked to acute kidney injury (AKI) and chronic kidney disease (CKD), but machine learning (ML) models predicting these risks post-pandemic have been absent. We aimed to use large electronic health records (EHR) and ML algorithms to predict the incidence of AKI and CKD during the post-pandemic period, assess the necessity of including COVID-19 infection history as a predictor, and develop a practical webpage application for clinical use.

Methods: National EHR data from TriNetX, emulating a prospective cohort of 104,565 patients from 07/01/2022 to 03/31/2024, were used. A total of 69 baseline variables were included, with demographics, comorbidities, lab test results, vital signs, medication histories, hospitalization visits, and COVID-19-related variables. Prediction windows of 1 month and 1 year were defined to assess AKI and CKD incidence. Eight machine learning models, primarily including extreme gradient boosting (XGBoost), neural network, and random forest (RF), were applied. Cross-validation and model tuning were conducted during the training process. Model performance was evaluated using six metrics, including the area under the receiver-operating-characteristic curve (AUROC). A combination of model-driven, data-driven, and clinical-driven methods was employed to identify the final models. An application with the final models was built using the R Shiny framework.

Results: The final models, incorporating 9 variables-primarily including eGFR, inpatient visit number, and number of COVID-19 infections-were selected. XGBoost demonstrated the best performance for predicting the incidence of AKI in 1 month (AUROC = 0.803), AKI in 1 year (AUROC = 0.799), and CKD in 1 year (AUROC = 0.894). Random Forest (RF) was selected for predicting the incidence of CKD in 1 month (AUROC = 0.896). A comparison of AUROC with and without COVID-19 infection confirmed its importance as a critical predictor in the model. The final models were translated into a convenient tool to facilitate their use in clinical settings.

Conclusions: Our study demonstrates the applicability of using large national EHR data in developing high-performance machine learning models to predict AKI and CKD risks in the post-COVID-19 period. Incorporating the number of COVID-19 infections in the past year showed improved prediction performance and should be considered in future models for kidney disease prediction. A user-friendly application was created to support clinicians in risk assessment and surveillance. Background: Artificial Intelligence and Biomedical Informatics Pilot Funding, Penn State College of Medicine.

Authors

Yue Zhang, Nasrollah Ghahramani, Runjia Li, Vernon Chinchilli, Djibril Ba

Prediction of acute and chronic kidney diseases during the post-covid-19 pandemic with machine learning models: utilizing national electronic health records in the US.

Similar Publications