Assessment of the Accuracy of Modern Artificial Intelligence Chatbots in Responding to Endodontic Queries

Çakar, MELİS; Avcı, Ayşe; Düzgün, SALİH; Aslan, TUĞRUL; Hekimoğlu, Kübra

doi:10.1111/aej.70012

Assessment of the Accuracy of Modern Artificial Intelligence Chatbots in Responding to Endodontic Queries

Çakar M., Avcı A. T. E., Düzgün S., Aslan T., Hekimoğlu K. N.

AUSTRALIAN ENDODONTIC JOURNAL, cilt.51, ss.732-739, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 51
Basım Tarihi: 2025
Doi Numarası: 10.1111/aej.70012
Dergi Adı: AUSTRALIAN ENDODONTIC JOURNAL
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, MEDLINE
Sayfa Sayıları: ss.732-739
Anahtar Kelimeler: ChatGPT, clinical decision-making, diagnostic accuracy, Gemini, large language models
Erciyes Üniversitesi Adresli: Evet

Özet

This study aims to compare the accuracy of modern AI chatbots, including Gemini 1.5 Flash, Gemini 1.5 Pro, ChatGPT-3.5 and ChatGPT-4, in responding to endodontic questions and supporting clinicians. Forty yes/no questions covering 12 endodontic topics were formulated by three experts. Each question was presented to the AI models on the same day, with a new chat session initiated for each. The agreement between chatbot responses and expert consensus was assessed using Cohen's kappa test (p < 0.05). ChatGPT-3.5 demonstrated the highest accuracy (80%), followed by ChatGPT-4 (77.5%), Gemini 1.5 Pro (72.5%) and Gemini 1.5 Flash (60%). The agreement levels ranged from weak (ChatGPT models) to minimal (Gemini Flash). The findings indicate variability in chatbot performance, with ChatGPT models outperforming Gemini. However, reliance on AI-generated responses for clinical decision-making remains questionable. Future studies should incorporate more complex clinical scenarios and broader analytical approaches to enhance the assessment of AI chatbots in endodontics.