Adequacy of ChatGPT responses to frequently asked questions about shoulder arthroplasty: is it an appropriate adjunct for patient education?

Journal: JSES International

Published: June 09, 2025

Abstract

Artificial intelligence (AI) large language models, such as ChatGPT, have numerous novel applications in medicine, one of which is patient education. Several studies in other specialties have investigated the adequacy of ChatGPT-generated responses to frequently asked questions (FAQs) by patients, with largely positive results. The purpose of this study is to evaluate the accuracy and clarity of ChatGPT-generated responses to website-derived FAQs relating to shoulder arthroplasty. Ten questions regarding shoulder arthroplasty were compiled from the websites of 5 leading academic institutions. These questions were rated on a scale from 1 to 4, corresponding to "excellent response not requiring clarification," "satisfactory requiring minimal clarification," "satisfactory requiring moderate clarification," and "unsatisfactory requiring substantial clarification," respectively, by 2 orthopedic surgeons. A senior shoulder arthroplasty surgeon arbitrated disagreements. Cohen's Kappa coefficient was utilized to assess inter-rater agreement. After arbitration, only one response was rated as "excellent response not requiring clarification." Nine of 10 responses required clarification. Four were rated as a "satisfactory requiring minimal clarification," 5 were rated as a "satisfactory requiring moderate clarification," and none were rated as "unsatisfactory requiring substantial clarification". The Kappa coefficient was 0.516 (P = .027), indicating moderate agreement between reviewers. When queried with FAQs regarding shoulder arthroplasty, ChatGPT's responses were all deemed 'satisfactory', but most required clarification. This may be due to the nuances of anatomic vs. reverse shoulder replacement. Thus, patients may find benefit in using ChatGPT to guide whether or not they should seek medical attention, but are limited in the detail and accuracy of treatment-related questions. While a helpful tool to start provider-patient conversations, it does not appear that ChatGPT provides quality, verified, data-driven answers at this time, and should be used cautiously in conjunction to provider-patient discussions. Although the use of ChatGPT in answering FAQs is limited at the moment, orthopedic surgeons should continue to monitor the use of ChatGPT as a patient education tool, as well as the expanding use of AI as a possible adjunct in clinical decision-making.

Authors

Christopher Johnson, Krishna Mandalia, Jason Corban, Kaley Beall, Sarav Shah

Adequacy of ChatGPT responses to frequently asked questions about shoulder arthroplasty: is it an appropriate adjunct for patient education?

Similar Publications