Role of Model Size and Prompting Strategies in Extracting Labels from Free-Text Radiology Reports with Open-Source Large Language Models.

Journal: Journal Of Imaging Informatics In Medicine

Published: January 08, 2025

Abstract

Extracting accurate labels from radiology reports is essential for training medical image analysis models. Large language models (LLMs) show promise for automating this process. The purpose of this study is to evaluate how model size and prompting strategies affect label extraction accuracy and downstream performance in open-source LLMs. Three open-source LLMs (Llama-3, Phi-3 mini, and Zephyr-beta) were used to extract labels from 227,827 MIMIC-CXR radiology reports. Performance was evaluated against human annotations on 2000 MIMIC-CXR reports, and through training image classifiers for pneumothorax and rib fracture detection tested on the CANDID-PTX dataset (n = 19,237). LLM-based labeling outperformed the CheXpert labeler, with the best LLM achieving 95% sensitivity for fracture detection versus CheXpert's 51%. Larger models showed better sensitivity, while chain-of-thought prompting had variable effects. Image classifiers showed resilience to labeling noise when tested externally. The choice of test set labeling schema significantly affected reported performance-a classifier trained on Llama-3 with chain-of-thought labels achieved AUCs of 0.96 and 0.84 for pneumothorax and fracture detection respectively when evaluated against human annotations, compared to 0.91 and 0.73 when evaluated on CheXpert labels. Open-source LLMs effectively extract labels from radiology reports at scale. While larger pre-trained models generally perform better, the choice of model size and prompting strategy should be task specific. Careful consideration of evaluation methods is critical for interpreting classifier performance.

Authors

Bardia Khosravi, Theo Dapamede, Frank Li, Zvipo Chisango, Anirudh Bikmal, Sara Garg, Babajide Owosela, Amirali Khosravi, Mohammadreza Chavoshi, Hari Trivedi, Cody Wyles, Saptarshi Purkayastha, Bradley Erickson, Judy Gichoya

Role of Model Size and Prompting Strategies in Extracting Labels from Free-Text Radiology Reports with Open-Source Large Language Models.

Similar Publications