Accuracy of deep learning-based AI models for early caries lesion detection: the influence of annotation quality and reference choice

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Objectives: The objective of this study is to assess how different annotation methods used during AI model training affect the accuracy of early caries lesion detection, and how the choice of the evaluation reference standard leads to significant differences in assessing AI models’ outcomes. Clinical Relevance. AI-based tools for caries detection are becoming common in dentistry. This study shows that how these models are evaluated can significantly impact perceived performance. Clinicians and developers should ensure that evaluation standards are independent and clinically relevant to avoid overestimating AI’s diagnostic abilities and to build trust for real-world use and regulatory approval.

Methods: Multiple AI caries lesion segmentation models were trained on the ACTA-DIRECT dataset using annotations from (1) single dentists, (2) aggregated strategies (majority vote, consensus meetings, STAPLE), and (3) micro-CT-based methods. Model accuracy was evaluated using two approaches: (1) comparison against micro-CT-based annotations and (2) comparison against the training-matched annotations. Statistical significance of differences in model diagnostic accuracy across annotation strategies was assessed using the McNemar test.

Results: There was no statistically significant difference in diagnostic accuracy among AI models when compared to micro-CT-based annotations. However, the diagnostic accuracy was considered statistically significantly higher when the results of the AI models were evaluated with the training-matched annotations.

Conclusion: Our findings indicate a strong influence of reference standards on AI model evaluation. While annotation strategies during training did not significantly affect AI accuracy in caries lesion segmentation, evaluation was subject to bias when models were tested against different reference standards.

Clinical relevance : AI-based tools for caries detection are becoming common in dentistry. This study shows that how these models are evaluated can significantly impact perceived performance. Clinicians and developers should ensure that evaluation standards are independent and clinically relevant to avoid overestimating AI’s diagnostic abilities and to build trust for real-world use and regulatory approval.

Original languageEnglish
Article number598
Pages (from-to)1-20
Number of pages20
JournalClinical Oral Investigations
Volume29
Issue number12
DOIs
Publication statusPublished - Dec 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025.

Keywords

  • Artificial intelligence (AI)
  • Dental caries detection
  • Dental radiology
  • Image annotation
  • Proximal caries lesion
  • Training sources

Fingerprint

Dive into the research topics of 'Accuracy of deep learning-based AI models for early caries lesion detection: the influence of annotation quality and reference choice'. Together they form a unique fingerprint.

Cite this