Benchmarking Large Language Models for Extraction of International Classification of Diseases Codes from Clinical Documentation.

Journal: MedRxiv : The Preprint Server For Health Sciences
Published:
Abstract

Healthcare reimbursement and coding is dependent on accurate extraction of International Classification of Diseases-tenth revision - clinical modification (ICD-10-CM) codes from clinical documentation. Attempts to automate this task have had limited success. This study aimed to evaluate the performance of large language models (LLMs) in extracting ICD-10-CM codes from unstructured inpatient notes and benchmark them against human coder. This study compared performance of GPT-3.5, GPT4, Claude 2.1, Claude 3, Gemini Advanced, and Llama 2-70b in extracting ICD-10-CM codes from unstructured inpatient notes against a human coder. We presented deidentified inpatient notes from American Health Information Management Association Vlab authentic patient cases to LLMs and human coder for extraction of ICD-10-CM codes. We used a standard prompt for extracting ICD-10-CM codes. The human coder analyzed the same notes using 3M Encoder, adhering to the 2022-ICD-10-CM Coding Guidelines. In this study, we analyzed 50 inpatient notes, comprising of 23 history and physicals and 27 progress notes. The human coder identified 165 unique codes with a median of 4 codes per note. The LLMs extracted varying numbers of median codes per note: GPT 3.5: 7, GPT4: 6, Claude 2.1: 6, Claude 3: 8, Gemini Advanced: 5, and Llama 2-70b:11. GPT 4 had the best performance though the agreement with human coder was poor at 15.2% for overall extraction of ICD-10-CM codes and 26.4% for extraction of category ICD-10-CM codes. Current LLMs have poor performance in extraction of ICD-10-CM codes from inpatient notes when compared against a human coder.

Authors
Ashley Simmons, Kullaya Takkavatakarn, Megan Mcdougal, Brian Dilcher, Jami Pincavitch, Lukas Meadows, Justin Kauffman, Eyal Klang, Rebecca Wig, Gordon Smith, Ali Soroush, Robert Freeman, Donald Apakama, Alexander Charney, Roopa Kohli Seth, Girish Nadkarni, Ankit Sakhuja