Crowdsourcing inclusivity: Dealing with diversity of opinions, perspectives and ambiguity in annotated data: The Crowdtruth tutorial

Lora Aroyo, Zoltán Szlávik, Anca Dumitrache, Benjamin Timmermans, Oana Inel, Chris Welty

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

In this tutorial, we introduce a novel crowdsourcing methodology called CrowdTruth [1, 9]. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus provide more reliable, realistic and inclusive real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. CrowdTruth is a widely used crowdsourcing methodology1 adopted by industrial partners and public organizations such as Google, IBM, New York Times, Cleveland Clinic, Crowdynews, Sound and Vision archive, Rijksmuseum, and in a multitude of domains such as AI, news, medicine, social media, cultural heritage, and social sciences. The goal of this tutorial is to introduce the audience to a novel approach to crowdsourcing that takes advantage of the diversity of opinions and perspectives that is inherent to the Web, as methods that deal with disagreement and diversity in crowdsourcing have become increasingly popular. Creating this more complex notion of truth contributes directly to the larger discussion on how to make the Web more reliable, diverse and inclusive.

Original languageEnglish
Title of host publicationThe Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019
PublisherAssociation for Computing Machinery, Inc
Pages1294-1295
Number of pages2
ISBN (Electronic)9781450366755
DOIs
Publication statusPublished - 13 May 2019
Event2019 World Wide Web Conference, WWW 2019 - San Francisco, United States
Duration: 13 May 201917 May 2019

Conference

Conference2019 World Wide Web Conference, WWW 2019
CountryUnited States
CitySan Francisco
Period13/05/1917/05/19

Fingerprint

Social sciences
Medicine
Learning systems
Acoustic waves

Keywords

  • Ambiguity
  • Computational Social Sciences
  • Crowdsourcing
  • Digital Humanities
  • Diversity
  • Ground Truth
  • Inter-annotator Disagreement
  • Medical Text Annotation
  • Perspectives

Cite this

Aroyo, L., Szlávik, Z., Dumitrache, A., Timmermans, B., Inel, O., & Welty, C. (2019). Crowdsourcing inclusivity: Dealing with diversity of opinions, perspectives and ambiguity in annotated data: The Crowdtruth tutorial. In The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019 (pp. 1294-1295). Association for Computing Machinery, Inc. https://doi.org/10.1145/3308558.3320096
Aroyo, Lora ; Szlávik, Zoltán ; Dumitrache, Anca ; Timmermans, Benjamin ; Inel, Oana ; Welty, Chris. / Crowdsourcing inclusivity : Dealing with diversity of opinions, perspectives and ambiguity in annotated data: The Crowdtruth tutorial. The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, 2019. pp. 1294-1295
@inproceedings{8fc62aba145940eab271b20e55a88a2c,
title = "Crowdsourcing inclusivity: Dealing with diversity of opinions, perspectives and ambiguity in annotated data: The Crowdtruth tutorial",
abstract = "In this tutorial, we introduce a novel crowdsourcing methodology called CrowdTruth [1, 9]. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus provide more reliable, realistic and inclusive real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. CrowdTruth is a widely used crowdsourcing methodology1 adopted by industrial partners and public organizations such as Google, IBM, New York Times, Cleveland Clinic, Crowdynews, Sound and Vision archive, Rijksmuseum, and in a multitude of domains such as AI, news, medicine, social media, cultural heritage, and social sciences. The goal of this tutorial is to introduce the audience to a novel approach to crowdsourcing that takes advantage of the diversity of opinions and perspectives that is inherent to the Web, as methods that deal with disagreement and diversity in crowdsourcing have become increasingly popular. Creating this more complex notion of truth contributes directly to the larger discussion on how to make the Web more reliable, diverse and inclusive.",
keywords = "Ambiguity, Computational Social Sciences, Crowdsourcing, Digital Humanities, Diversity, Ground Truth, Inter-annotator Disagreement, Medical Text Annotation, Perspectives",
author = "Lora Aroyo and Zolt{\'a}n Szl{\'a}vik and Anca Dumitrache and Benjamin Timmermans and Oana Inel and Chris Welty",
year = "2019",
month = "5",
day = "13",
doi = "10.1145/3308558.3320096",
language = "English",
pages = "1294--1295",
booktitle = "The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019",
publisher = "Association for Computing Machinery, Inc",

}

Aroyo, L, Szlávik, Z, Dumitrache, A, Timmermans, B, Inel, O & Welty, C 2019, Crowdsourcing inclusivity: Dealing with diversity of opinions, perspectives and ambiguity in annotated data: The Crowdtruth tutorial. in The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, pp. 1294-1295, 2019 World Wide Web Conference, WWW 2019, San Francisco, United States, 13/05/19. https://doi.org/10.1145/3308558.3320096

Crowdsourcing inclusivity : Dealing with diversity of opinions, perspectives and ambiguity in annotated data: The Crowdtruth tutorial. / Aroyo, Lora; Szlávik, Zoltán; Dumitrache, Anca; Timmermans, Benjamin; Inel, Oana; Welty, Chris.

The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, 2019. p. 1294-1295.

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Crowdsourcing inclusivity

T2 - Dealing with diversity of opinions, perspectives and ambiguity in annotated data: The Crowdtruth tutorial

AU - Aroyo, Lora

AU - Szlávik, Zoltán

AU - Dumitrache, Anca

AU - Timmermans, Benjamin

AU - Inel, Oana

AU - Welty, Chris

PY - 2019/5/13

Y1 - 2019/5/13

N2 - In this tutorial, we introduce a novel crowdsourcing methodology called CrowdTruth [1, 9]. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus provide more reliable, realistic and inclusive real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. CrowdTruth is a widely used crowdsourcing methodology1 adopted by industrial partners and public organizations such as Google, IBM, New York Times, Cleveland Clinic, Crowdynews, Sound and Vision archive, Rijksmuseum, and in a multitude of domains such as AI, news, medicine, social media, cultural heritage, and social sciences. The goal of this tutorial is to introduce the audience to a novel approach to crowdsourcing that takes advantage of the diversity of opinions and perspectives that is inherent to the Web, as methods that deal with disagreement and diversity in crowdsourcing have become increasingly popular. Creating this more complex notion of truth contributes directly to the larger discussion on how to make the Web more reliable, diverse and inclusive.

AB - In this tutorial, we introduce a novel crowdsourcing methodology called CrowdTruth [1, 9]. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus provide more reliable, realistic and inclusive real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. CrowdTruth is a widely used crowdsourcing methodology1 adopted by industrial partners and public organizations such as Google, IBM, New York Times, Cleveland Clinic, Crowdynews, Sound and Vision archive, Rijksmuseum, and in a multitude of domains such as AI, news, medicine, social media, cultural heritage, and social sciences. The goal of this tutorial is to introduce the audience to a novel approach to crowdsourcing that takes advantage of the diversity of opinions and perspectives that is inherent to the Web, as methods that deal with disagreement and diversity in crowdsourcing have become increasingly popular. Creating this more complex notion of truth contributes directly to the larger discussion on how to make the Web more reliable, diverse and inclusive.

KW - Ambiguity

KW - Computational Social Sciences

KW - Crowdsourcing

KW - Digital Humanities

KW - Diversity

KW - Ground Truth

KW - Inter-annotator Disagreement

KW - Medical Text Annotation

KW - Perspectives

UR - http://www.scopus.com/inward/record.url?scp=85066899250&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066899250&partnerID=8YFLogxK

U2 - 10.1145/3308558.3320096

DO - 10.1145/3308558.3320096

M3 - Conference contribution

SP - 1294

EP - 1295

BT - The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019

PB - Association for Computing Machinery, Inc

ER -

Aroyo L, Szlávik Z, Dumitrache A, Timmermans B, Inel O, Welty C. Crowdsourcing inclusivity: Dealing with diversity of opinions, perspectives and ambiguity in annotated data: The Crowdtruth tutorial. In The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc. 2019. p. 1294-1295 https://doi.org/10.1145/3308558.3320096