Information from digital and human sources: A comparison of chatbot and clinician responses to orthodontic questions.

Journal: American Journal Of Orthodontics And Dentofacial Orthopedics : Official Publication Of The American Association Of Orthodontists, Its Constituent Societies, And The American Board Of Orthodontics
Published:
Abstract

Background: This study aimed to investigate whether artificial intelligence (AI)-based chatbots can be used as reliable adjunct tools in orthodontic practice by evaluating chatbot responses and comparing them to those of clinicians with varying levels of knowledge.

Methods: Large language model-based chatbots (ChatGPT-4, ChatGPT-4o, Microsoft Copilot, Google Gemini 1.5 Pro, and Claude 3.5 Sonnet) and clinicians (dental students, general dentists, and orthodontists; n = 30) were included. The groups were asked 40 true and false questions, and the accuracy rate for each question was assessed by comparing it to the predetermined answer key. The total score was converted into a percentage. The Kruskal-Wallis test and Dunn's multiple comparison tests were used to compare accuracy rates. The consistency of the answers given by chatbots at 3 different times was assessed by Cronbach α.

Results: The accuracy ratio scores for students were significantly lower than Microsoft Copilot (P = 0.029), Claude 3.5 Sonnet (P = 0.023), ChatGPT-4o (P = 0.005), and orthodontists (P = 0.001). For dentists, the accuracy ratio scores were found to be significantly lower than ChatGPT-4o (P = 0.019) and orthodontists (P = 0.001). The accuracy rate of ChatGPT-4o was closest to that of orthodontists, whereas the accuracy rates of ChatGPT-4, Microsoft Copilot, Claude 3.5 Sonnet, and Google Gemini 1.5 Pro were lower than orthodontists but higher than general dentists. Although ChatGPT-4 demonstrated a high degree of consistency in its responses, evidenced by a high Cronbach α value (α = 0.867), ChatGPT-4o (α = 0.256) and Claude 3.5 Sonnet (α = 0.256) were the least consistent chatbots.

Conclusions: The study found that orthodontists had the highest accuracy rate, whereas AI-based chatbots had a higher accuracy rate compared with dental students and general dentists. However, ChatGPT-4 gave the most consistent answers, whereas ChatGPT-4o and Claude 3.5 Sonnet showed the least consistency. AI-based chatbots can be useful for patient education and general orthodontic guidance, but a lack of consistency in responses can lead to the risk of misinformation.

Authors
Ufuk Metin, Merve Goymen