An Open-Architecture AI Model for CPT Coding in Breast Surgery: Development, Validation, and Prospective Testing.
Objective: To develop, validate, and prospectively test an open-architecture, transformer-based Artificial Intelligence (AI) model to extract procedure codes from free-text breast surgery operative notes.
Background: Operative note coding is time-intensive and error-prone, leading to lost revenue and compliance risks. While AI offers potential solutions, adoption has been limited due to proprietary, closed-source systems lacking transparency and standardized validation.
Methods: We included all institutional breast surgery operative notes from July 2017 to December 2023. Expert medical coders manually reviewed and validated surgeon-assigned Current Procedural Terminology (CPT) codes, establishing a reference standard. We developed and validated an AI model to predict CPT codes from operative notes using two versions of the pre-trained GatorTron clinical language model: a compact 345 million-parameter model and a larger 3.9 billion-parameter model, each fine-tuned on our labeled dataset. Performance was evaluated using the area under the precision-recall curve (AUPRC). Prospective testing was conducted on operative notes from May to October 2024.
Results: Our dataset included 3,259 operative notes with 8,036 CPT codes. Surgeon coding discrepancies were present in 12% of cases (overcoding: 8%, undercoding: 10%). The AI model showed strong alignment with the reference standard (compact version AUPRC: 0.976 [0.970, 0.983], large version AUPRC: 0.981 [0.977, 0.986]) on cross-validation, outperforming surgeons (AUPRC: 0.937). Prospective testing on 268 notes confirmed strong real-world performance.
Conclusions: Our open-architecture AI model demonstrated high performance in automating CPT code extraction, offering a scalable and transparent solution to improve surgical coding efficiency. Future work will assess whether AI can surpass human coders in accuracy and reliability.