Studying topical relevance with evidence-based crowdsourcing

Oana Inel, Zolt n. Szl vik, Giannis Haralabopoulos, Elena Simperl, Dan Li, Evangelos Kanoulas, Christophe Van Gysel, Lora Aroyo

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, the task of creating such test collections is laborious due to the large amounts of data that need to be annotated, and due to the intrinsic subjectivity of the task itself. In this paper we study the topical relevance from a user perspective by addressing the problems of subjectivity and ambiguity. We compare our approach and results with the established TREC annotation guidelines and results. The comparison is based on a series of crowdsourcing pilots experimenting with variables, such as relevance scale, document granularity, annotation template and the number of workers. Our results show correlation between relevance assessment accuracy and smaller document granularity, i.e., aggregation of relevance on paragraph level results in a better relevance accuracy, compared to assessment done at the level of the full document. As expected, our results also show that collecting binary relevance judgments results in a higher accuracy compared to the ternary scale used in the TREC annotation guidelines. Finally, the crowdsourced annotation tasks provided a more accurate document relevance ranking than a single assessor relevance label. This work resulted is a reliable test collection around the TREC Common Core track.

LanguageEnglish
Title of host publicationCIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
EditorsNorman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster
PublisherAssociation for Computing Machinery
Pages1253-1262
Number of pages10
ISBN (Electronic)9781450360142
DOIs
StatePublished - 17 Oct 2018
Event27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy
Duration: 22 Oct 201826 Oct 2018

Conference

Conference27th ACM International Conference on Information and Knowledge Management, CIKM 2018
CountryItaly
CityTorino
Period22/10/1826/10/18

Fingerprint

Evidence-based
Annotation
Test collections
Subjectivity
Information retrieval
Relevance judgments
Workers
Ranking
Template
Intrinsic

Keywords

  • Crowdsourcing
  • IR evaluation
  • TREC Common Core track

Cite this

Inel, O., Szl vik, Z. N., Haralabopoulos, G., Simperl, E., Li, D., Kanoulas, E., ... Aroyo, L. (2018). Studying topical relevance with evidence-based crowdsourcing. In N. Paton, S. Candan, H. Wang, J. Allan, R. Agrawal, A. Labrinidis, A. Cuzzocrea, M. Zaki, D. Srivastava, A. Broder, ... A. Schuster (Eds.), CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 1253-1262). Association for Computing Machinery. DOI: 10.1145/3269206.3271779
Inel, Oana ; Szl vik, Zolt n. ; Haralabopoulos, Giannis ; Simperl, Elena ; Li, Dan ; Kanoulas, Evangelos ; Van Gysel, Christophe ; Aroyo, Lora. / Studying topical relevance with evidence-based crowdsourcing. CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management. editor / Norman Paton ; Selcuk Candan ; Haixun Wang ; James Allan ; Rakesh Agrawal ; Alexandros Labrinidis ; Alfredo Cuzzocrea ; Mohammed Zaki ; Divesh Srivastava ; Andrei Broder ; Assaf Schuster. Association for Computing Machinery, 2018. pp. 1253-1262
@inproceedings{422dc2800a504b7ab08dc2c416dcfedc,
title = "Studying topical relevance with evidence-based crowdsourcing",
abstract = "Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, the task of creating such test collections is laborious due to the large amounts of data that need to be annotated, and due to the intrinsic subjectivity of the task itself. In this paper we study the topical relevance from a user perspective by addressing the problems of subjectivity and ambiguity. We compare our approach and results with the established TREC annotation guidelines and results. The comparison is based on a series of crowdsourcing pilots experimenting with variables, such as relevance scale, document granularity, annotation template and the number of workers. Our results show correlation between relevance assessment accuracy and smaller document granularity, i.e., aggregation of relevance on paragraph level results in a better relevance accuracy, compared to assessment done at the level of the full document. As expected, our results also show that collecting binary relevance judgments results in a higher accuracy compared to the ternary scale used in the TREC annotation guidelines. Finally, the crowdsourced annotation tasks provided a more accurate document relevance ranking than a single assessor relevance label. This work resulted is a reliable test collection around the TREC Common Core track.",
keywords = "Crowdsourcing, IR evaluation, TREC Common Core track",
author = "Oana Inel and {Szl vik}, {Zolt n.} and Giannis Haralabopoulos and Elena Simperl and Dan Li and Evangelos Kanoulas and {Van Gysel}, Christophe and Lora Aroyo",
year = "2018",
month = "10",
day = "17",
doi = "10.1145/3269206.3271779",
language = "English",
pages = "1253--1262",
editor = "Norman Paton and Selcuk Candan and Haixun Wang and James Allan and Rakesh Agrawal and Alexandros Labrinidis and Alfredo Cuzzocrea and Mohammed Zaki and Divesh Srivastava and Andrei Broder and Assaf Schuster",
booktitle = "CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management",
publisher = "Association for Computing Machinery",

}

Inel, O, Szl vik, ZN, Haralabopoulos, G, Simperl, E, Li, D, Kanoulas, E, Van Gysel, C & Aroyo, L 2018, Studying topical relevance with evidence-based crowdsourcing. in N Paton, S Candan, H Wang, J Allan, R Agrawal, A Labrinidis, A Cuzzocrea, M Zaki, D Srivastava, A Broder & A Schuster (eds), CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, pp. 1253-1262, 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, 22/10/18. DOI: 10.1145/3269206.3271779

Studying topical relevance with evidence-based crowdsourcing. / Inel, Oana; Szl vik, Zolt n.; Haralabopoulos, Giannis; Simperl, Elena; Li, Dan; Kanoulas, Evangelos; Van Gysel, Christophe; Aroyo, Lora.

CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ed. / Norman Paton; Selcuk Candan; Haixun Wang; James Allan; Rakesh Agrawal; Alexandros Labrinidis; Alfredo Cuzzocrea; Mohammed Zaki; Divesh Srivastava; Andrei Broder; Assaf Schuster. Association for Computing Machinery, 2018. p. 1253-1262.

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Studying topical relevance with evidence-based crowdsourcing

AU - Inel,Oana

AU - Szl vik,Zolt n.

AU - Haralabopoulos,Giannis

AU - Simperl,Elena

AU - Li,Dan

AU - Kanoulas,Evangelos

AU - Van Gysel,Christophe

AU - Aroyo,Lora

PY - 2018/10/17

Y1 - 2018/10/17

N2 - Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, the task of creating such test collections is laborious due to the large amounts of data that need to be annotated, and due to the intrinsic subjectivity of the task itself. In this paper we study the topical relevance from a user perspective by addressing the problems of subjectivity and ambiguity. We compare our approach and results with the established TREC annotation guidelines and results. The comparison is based on a series of crowdsourcing pilots experimenting with variables, such as relevance scale, document granularity, annotation template and the number of workers. Our results show correlation between relevance assessment accuracy and smaller document granularity, i.e., aggregation of relevance on paragraph level results in a better relevance accuracy, compared to assessment done at the level of the full document. As expected, our results also show that collecting binary relevance judgments results in a higher accuracy compared to the ternary scale used in the TREC annotation guidelines. Finally, the crowdsourced annotation tasks provided a more accurate document relevance ranking than a single assessor relevance label. This work resulted is a reliable test collection around the TREC Common Core track.

AB - Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, the task of creating such test collections is laborious due to the large amounts of data that need to be annotated, and due to the intrinsic subjectivity of the task itself. In this paper we study the topical relevance from a user perspective by addressing the problems of subjectivity and ambiguity. We compare our approach and results with the established TREC annotation guidelines and results. The comparison is based on a series of crowdsourcing pilots experimenting with variables, such as relevance scale, document granularity, annotation template and the number of workers. Our results show correlation between relevance assessment accuracy and smaller document granularity, i.e., aggregation of relevance on paragraph level results in a better relevance accuracy, compared to assessment done at the level of the full document. As expected, our results also show that collecting binary relevance judgments results in a higher accuracy compared to the ternary scale used in the TREC annotation guidelines. Finally, the crowdsourced annotation tasks provided a more accurate document relevance ranking than a single assessor relevance label. This work resulted is a reliable test collection around the TREC Common Core track.

KW - Crowdsourcing

KW - IR evaluation

KW - TREC Common Core track

UR - http://www.scopus.com/inward/record.url?scp=85058057217&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058057217&partnerID=8YFLogxK

U2 - 10.1145/3269206.3271779

DO - 10.1145/3269206.3271779

M3 - Conference contribution

SP - 1253

EP - 1262

BT - CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management

PB - Association for Computing Machinery

ER -

Inel O, Szl vik ZN, Haralabopoulos G, Simperl E, Li D, Kanoulas E et al. Studying topical relevance with evidence-based crowdsourcing. In Paton N, Candan S, Wang H, Allan J, Agrawal R, Labrinidis A, Cuzzocrea A, Zaki M, Srivastava D, Broder A, Schuster A, editors, CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery. 2018. p. 1253-1262. Available from, DOI: 10.1145/3269206.3271779