Crowdsourcing ground truth for medical relation extraction

Anca Dumitrache, Lora Aroyo, Chris Welty

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Cognitive computing systems require human-labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task, while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.

LanguageEnglish
Article number11
Pages1-20
Number of pages20
JournalACM Transactions on Interactive Intelligent Systems
Volume8
Issue number2
Early online dateJun 2018
DOIs
Publication statusPublished - Jul 2018

Fingerprint

Learning systems
Costs
Experiments

Keywords

  • Clinical natural language processing
  • Crowd Truth
  • CrowdTruth
  • Ground truth
  • Inter-annotator disagreement
  • Natural language ambiguity
  • Relation extraction

Cite this

@article{fd0b8fa6df2748e3a2fa1887b56539d8,
title = "Crowdsourcing ground truth for medical relation extraction",
abstract = "Cognitive computing systems require human-labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task, while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.",
keywords = "Clinical natural language processing, Crowd Truth, CrowdTruth, Ground truth, Inter-annotator disagreement, Natural language ambiguity, Relation extraction",
author = "Anca Dumitrache and Lora Aroyo and Chris Welty",
year = "2018",
month = "7",
doi = "10.1145/3152889",
language = "English",
volume = "8",
pages = "1--20",
journal = "ACM Transactions on Interactive Intelligent Systems",
issn = "2160-6455",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

Crowdsourcing ground truth for medical relation extraction. / Dumitrache, Anca; Aroyo, Lora; Welty, Chris.

In: ACM Transactions on Interactive Intelligent Systems, Vol. 8, No. 2, 11, 07.2018, p. 1-20.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Crowdsourcing ground truth for medical relation extraction

AU - Dumitrache, Anca

AU - Aroyo, Lora

AU - Welty, Chris

PY - 2018/7

Y1 - 2018/7

N2 - Cognitive computing systems require human-labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task, while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.

AB - Cognitive computing systems require human-labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task, while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.

KW - Clinical natural language processing

KW - Crowd Truth

KW - CrowdTruth

KW - Ground truth

KW - Inter-annotator disagreement

KW - Natural language ambiguity

KW - Relation extraction

UR - http://www.scopus.com/inward/record.url?scp=85055101001&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055101001&partnerID=8YFLogxK

U2 - 10.1145/3152889

DO - 10.1145/3152889

M3 - Article

VL - 8

SP - 1

EP - 20

JO - ACM Transactions on Interactive Intelligent Systems

T2 - ACM Transactions on Interactive Intelligent Systems

JF - ACM Transactions on Interactive Intelligent Systems

SN - 2160-6455

IS - 2

M1 - 11

ER -