Using large language models as decision support tools in emergency ophthalmology.

Journal: International Journal Of Medical Informatics

Published: August 26, 2024

Abstract

Background: Large language models (LLMs) have shown promise in various medical applications, but their potential as decision support tools in emergency ophthalmology remains unevaluated using real-world cases.

Objective: We assessed the performance of state-of-the-art LLMs (GPT-4, GPT-4o, and Llama-3-70b) as decision support tools in emergency ophthalmology compared to human experts.

Methods: In this prospective comparative study, LLM-generated diagnoses and treatment plans were evaluated against those determined by certified ophthalmologists using 73 anonymized emergency cases from the University Hospital of Split. Two independent expert ophthalmologists graded both LLM and human-generated reports using a 4-point Likert scale.

Results: Human experts achieved a mean score of 3.72 (SD = 0.50), while GPT-4 scored 3.52 (SD = 0.64) and Llama-3-70b scored 3.48 (SD = 0.48). GPT-4o had lower performance with 3.20 (SD = 0.81). Significant differences were found between human and LLM reports (P < 0.001), specifically between human scores and GPT-4o. GPT-4 and Llama-3-70b showed performance comparable to ophthalmologists, with no statistically significant differences.

Conclusions: Large language models demonstrated accuracy as decision support tools in emergency ophthalmology, with performance comparable to human experts, suggesting potential for integration into clinical practice.

Authors

Ante Kreso, Zvonimir Boban, Sime Kabic, Filip Rada, Darko Batistic, Ivana Barun, Ljubo Znaor, Marko Kumric, Josko Bozic, Josip Vrdoljak

Using large language models as decision support tools in emergency ophthalmology.

Similar Publications