Two artificial intelligence chatbots, GPT-4o and Gemini 2.5 Pro, performed at a level comparable to advanced endodontic trainees when tested on simulated American Board of Endodontics (ABE) oral examinations, according to a study published in the Journal of Endodontics on 26 February 2026. Researchers at Texas A&M College of Dentistry designed the test to assess clinical reasoning and decision-making rather than simple recall, using three endodontic cases with 20 open-ended questions each.

How the chatbots performed

Both systems scored highly on a 0-3 scale. Gemini 2.5 Pro achieved a mean score of 2.83, while GPT-4o scored 2.73. Independent assessment by two board-certified endodontists found most responses rated as acceptable to excellent. There was no statistically significant difference between the two models in clinical validity or overall performance. Gemini 2.5 Pro showed more consistency across the three scenarios, while GPT-4o varied more by case type.

Limitations and educational use

The study's lead author, Dr Poorya Jalali, stressed that these results should not be over-interpreted. The chatbots cannot perform clinical examination, interpret radiographs in real settings, or diagnose independently. They performed well because they received written prompts and detailed radiographic descriptions. A real ABE examination involves live timed interaction with examiners and independent radiographic interpretation. The findings suggest AI chatbots are best used as educational supplements rather than replacements for human instruction. They could help students and residents practise answering clinical questions, test their knowledge, and compare their reasoning with model answers. Future research will explore whether these tools can help design high-quality examination questions.