A meta-analysis on the reliability of comparative judgement

San Verhavert, Renske Bouwer, Vincent Donche, Sven De Maeyer

Research output: Contribution to JournalArticleAcademicpeer-review

145 Downloads (Pure)


Comparative Judgement (CJ) aims to improve the quality of performance-based assessments by letting multiple assessors judge pairs of performances. CJ is generally associated with high levels of reliability, but there is also a large variation in reliability between assessments. This study investigates which assessment characteristics influence the level of reliability. A meta-analysis was performed on the results of 49 CJ assessments. Results show that there was an effect of the number of comparisons on the level of reliability. In addition, the probability of reaching an asymptote in the reliability, i.e., the point where large effort is needed to only slightly increase the reliability, was larger for experts and peers than for novices. For reliability levels of.70 between 10 and 14 comparisons per performance are needed. This rises to 26 to 37 comparisons for a reliability of.90.

Original languageEnglish
Pages (from-to)541-562
Number of pages22
JournalAssessment in Education: Principles, Policy, & Practice
Issue number5
Early online date12 Apr 2019
Publication statusPublished - 2019


  • Comparative Judgement
  • Scale Separation Reliability
  • meta-analysis
  • performance assessment


Dive into the research topics of 'A meta-analysis on the reliability of comparative judgement'. Together they form a unique fingerprint.

Cite this