Comparative Judgement (CJ) aims to improve the quality of performance-based assessments by letting multiple assessors judge pairs of performances. CJ is generally associated with high levels of reliability, but there is also a large variation in reliability between assessments. This study investigates which assessment characteristics influence the level of reliability. A meta-analysis was performed on the results of 49 CJ assessments. Results show that there was an effect of the number of comparisons on the level of reliability. In addition, the probability of reaching an asymptote in the reliability, i.e., the point where large effort is needed to only slightly increase the reliability, was larger for experts and peers than for novices. For reliability levels of.70 between 10 and 14 comparisons per performance are needed. This rises to 26 to 37 comparisons for a reliability of.90.
|Number of pages||22|
|Journal||Assessment in Education: Principles, Policy, & Practice|
|Early online date||12 Apr 2019|
|Publication status||Published - 2019|
- Comparative Judgement
- Scale Separation Reliability
- performance assessment