TY - JOUR
T1 - Do we agree? Interrater reliability in education Zijn we het eens? Interbeoordelaarsbetrouwbaarheid in de pedagogiek en het onderwijs
AU - Van Der Ark, L.A.
AU - Ten Hove, D.
PY - 2019
Y1 - 2019
N2 - © 2019 Vereniging voor Onderwijsresearch (VOR). All Rights Reserved.Ratings and assessments are daily practices in education: Teachers decide whether a student's behaviour should be punished, teachers grade students' theses, and child-protection officers give complex assessments of juvenile delinquents. The interrater reliability (IRR) expresses the level of agreement among raters. We discuss three questions. The first question- How to determine the IRR? -cannot be answered unequivocally. Many coefficients are available to estimate the IRR, but often produce different results. The choice among coefficients depends on the research goal, research design, and personal preference. The answer to the second question-How to increase the IRR?-entails increasing the expertise of the raters', and the quality of the items, rubrics and procedure. To answer the third question-When is the IRR high enough?-we present a method to transform benchmarks for one IRR coefficient to another, but methodological research is required to provide a better answer to this question.
AB - © 2019 Vereniging voor Onderwijsresearch (VOR). All Rights Reserved.Ratings and assessments are daily practices in education: Teachers decide whether a student's behaviour should be punished, teachers grade students' theses, and child-protection officers give complex assessments of juvenile delinquents. The interrater reliability (IRR) expresses the level of agreement among raters. We discuss three questions. The first question- How to determine the IRR? -cannot be answered unequivocally. Many coefficients are available to estimate the IRR, but often produce different results. The choice among coefficients depends on the research goal, research design, and personal preference. The answer to the second question-How to increase the IRR?-entails increasing the expertise of the raters', and the quality of the items, rubrics and procedure. To answer the third question-When is the IRR high enough?-we present a method to transform benchmarks for one IRR coefficient to another, but methodological research is required to provide a better answer to this question.
UR - http://www.scopus.com/inward/record.url?scp=85064133415&partnerID=8YFLogxK
M3 - Article
SN - 0165-0645
VL - 95
SP - 361
EP - 371
JO - Pedagogische Studien
JF - Pedagogische Studien
IS - 5-6
ER -