Bridging the Gap in Neonatal Care: Evaluating AI Chatbots for Chronic Neonatal Lung Disease and Home Oxygen Therapy Management.

Journal: Pediatric Pulmonology
Published:
Abstract

Objective: To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) and home oxygen therapy (HOT).

Methods: Twenty CNLD and HOT-related questions were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, and GLM-4 were generated and evaluated by three experienced neonatologists using Likert scales for accuracy and comprehensiveness. Updated LLM models (ChatGPT-4o mini and Gemini 2.0 Flash Experimental) were incorporated to assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, and intraclass correlation coefficients.

Results: Bing Chat and Claude 3.5 Sonnet demonstrated superior performance, with the highest mean accuracy scores (5.78 ± 0.48 and 5.75 ± 0.54, respectively) and competence scores (2.65 ± 0.58 and 2.80 ± 0.41, respectively). In subsequent testing, Gemini 2.0 Flash Experimental and ChatGPT-4o mini achieved comparable high performance. Performance varied across domains, with all models excelling in "equipment and safety protocols" and "caregiver support." ERNIE Bot 3.5 and GLM-4 showed self-correction capabilities when prompted.

Conclusions: LLMs promise accurate CNLD/HOT information. However, performance variability and the risk of misinformation necessitate expert oversight and continued refinement before widespread clinical implementation.

Authors
Weiqin Liu, Hong Wei, Lingling Xiang, Yin Liu, Chunyi Wang, Ziyu Hua