An Open-Architecture AI Model for CPT Coding in Breast Surgery: Development, Validation, and Prospective Testing.

Journal: Annals Of Surgery
Published:
Abstract

Objective: To develop, validate, and prospectively test an open-architecture, transformer-based Artificial Intelligence (AI) model to extract procedure codes from free-text breast surgery operative notes.

Background: Operative note coding is time-intensive and error-prone, leading to lost revenue and compliance risks. While AI offers potential solutions, adoption has been limited due to proprietary, closed-source systems lacking transparency and standardized validation.

Methods: We included all institutional breast surgery operative notes from July 2017 to December 2023. Expert medical coders manually reviewed and validated surgeon-assigned Current Procedural Terminology (CPT) codes, establishing a reference standard. We developed and validated an AI model to predict CPT codes from operative notes using two versions of the pre-trained GatorTron clinical language model: a compact 345 million-parameter model and a larger 3.9 billion-parameter model, each fine-tuned on our labeled dataset. Performance was evaluated using the area under the precision-recall curve (AUPRC). Prospective testing was conducted on operative notes from May to October 2024.

Results: Our dataset included 3,259 operative notes with 8,036 CPT codes. Surgeon coding discrepancies were present in 12% of cases (overcoding: 8%, undercoding: 10%). The AI model showed strong alignment with the reference standard (compact version AUPRC: 0.976 [0.970, 0.983], large version AUPRC: 0.981 [0.977, 0.986]) on cross-validation, outperforming surgeons (AUPRC: 0.937). Prospective testing on 268 notes confirmed strong real-world performance.

Conclusions: Our open-architecture AI model demonstrated high performance in automating CPT code extraction, offering a scalable and transparent solution to improve surgical coding efficiency. Future work will assess whether AI can surpass human coders in accuracy and reliability.

Authors
Mohamad El Moheb, Kristin Putman, Olivia Sears, Melina Kibbe, K Kent, David Brenin, Allan Tsung