Bridging the Coding Gap: Assessing Large Language Models for Accurate Modifier Assignment in Craniofacial Operative Notes.
Background: Accurate medical coding is vital for proper health care management and reimbursement, especially in craniofacial surgery. Although CPT codes standardize clinical services, modifiers are often required to capture procedural complexities and ensure fair compensation. Applying modifiers accurately can be time-intensive and error-prone, posing challenges for coding professionals. Recent advancements in natural language processing (NLP) and large language models (LLMs) have shown promise in automating coding tasks. However, the ability of LLMs to identify the need for CPT modifiers from operative notes remains unexplored.
Methods: This study evaluates the capability of LLMs, including ChatGPT and Google Gemini, to identify necessary CPT modifiers from craniofacial operative notes. The authors collected notes containing common modifiers, such as Modifier 22 (increased procedural complexity), and compared model outputs to expert-coded results. The study focused on key modifiers relevant to craniofacial surgery, with performance assessed based on precision.
Results: Of the 10 operative reports evaluated, neither ChatGPT nor Gemini correctly identified both the CPT code and modifier for any case. However, ChatGPT more frequently generated responses containing partially correct CPT and modifier codes and was the only model to correctly assign a modifier code with a partially correct CPT in one instance. Both models produced multiple responses with either partially or completely inaccurate codes, including 4 entirely incorrect submissions each. Notably, some LLM suggestions fell within the appropriate CPT range but did not account for procedural specifics such as the inclusion of a graft or the depth of tissue debridement.
Conclusions: This study demonstrates the potential of LLMs as an ancillary tool for CPT modifier identification in craniofacial surgery. By reducing administrative burdens and improving accuracy, these tools could enhance efficiency and reimbursement for complex procedures. Future directions include refining LLM capabilities and evaluating their generalizability across other surgical subspecialties.