Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes.

Journal: AMIA ... Annual Symposium Proceedings. AMIA Symposium
Published:
Abstract

This study compared large language models (LLMs) and Bidirectional Encoder Representations from Transformers (BERT) models in identifying medication names, routes, and frequencies from publicly available free-text ophthalmology progress notes of 480 patients. 5,520 lines of annotated text were divided into train (N=3,864), validation (N=1,104), and test sets (N=552). We evaluated ChatGPT-3.5, ChatGPT-4, PaLM 2, and Gemini to identify these medication entities. We fine-tuned BERT, BioBERT, ClinicalBERT, DistilBERT, and RoBERTa for the same task using the training set. On the test set, GPT-4 achieved the best performance (macro-averaged F1 0.962). Among the BERT models, BioBERT achieved the best performance (macro-averaged F1 0.875). Modern LLMs outperformed BERT models even in the highly domain-specific task of identifying ophthalmic medication information from progress notes, showcasing the potential of LLMs for medical named entity recognition to enhance patient care.

Authors
Iyad Majid, Vaibhav Mishra, Rohith Ravindranath, Sophia Wang