Odontology, 2026 (SCI-Expanded, Scopus)
This study aims to compare the clinical decision-making accuracy of different artificial intelligence (AI) models in endodontic treatment planning for patients with systemic diseases. A scenario-based, cross-sectional educational study was conducted using 40 standardized clinical scenarios representing ten commonly encountered systemic conditions affecting endodontic care. Scenarios were developed based on international endodontic and medical guidelines and reviewed by medical specialists and experienced endodontists. Four AI models, ChatGPT-5.1, Gemini 2.5 Pro, Gemini 2.5 Flash, and ChatGPT-3.5, were queried using identical, standardized prompts within fully isolated interaction environments to prevent contextual memory effects. AI-generated responses were independently evaluated by two calibrated endodontists using a predefined 10-point scoring system across four clinical domains. Clinical accuracy was categorized as high, partial, or incorrect. Nonparametric statistical analyses were performed. No statistically significant differences were observed among AI models in overall clinical decision accuracy or domain-specific scores (Friedman test, p > 0.05). Although categorical analysis revealed an overall difference in the proportion of high-accuracy responses (Cochran’s Q, p = 0.007), post hoc comparisons did not demonstrate significant pairwise differences. Deviation analysis revealed comparable proximity of all models to the expert-defined optimal decisions, with greater variability observed for the Gemini 2.5 Flash. Current AI models demonstrate comparable clinical decision-making performance in endodontic scenarios involving medically compromised patients. While descriptive trends were observed, no single model consistently outperformed others. AI systems may serve as supportive decision-making tools when used under professional supervision, but should not replace clinical judgment.