CrowdTruth measures for language ambiguity the case of medical relation extraction

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

A widespread use of linked data for information extraction is distant supervision, in which relation tuples from a data source are found in sentences in a text corpus, and those sentences are treated as training data for relation extraction systems. Distant supervision is a cheap way to acquire training data, but that data can be quite noisy, which limits the performance of a system trained with it. Human annotators can be used to clean the data, but in some domains, such as medical NLP, it is widely believed that only medical experts can do this reliably. We have been investigating the use of crowdsourcing as an affordable alternative to using experts to clean noisy data, and have found that with the proper analysis, crowds can rival and even out-perform the precision and recall of experts, at a much lower cost. We have further found that the crowd, by virtue of its diversity, can help us find evidence of ambiguous sen-tences that are difficult to classify, and we have hypothesized that such sentences are likely just as difficult for machines to classify. In this pa-per we outline CrowdTruth, a previously presented method for scoring ambiguous sentences that suggests that existing modes of truth are in-adequate, and we present for the first time a set of weighted metrics for evaluating the performance of experts, the crowd, and a trained classiffier in light of ambiguity. We show that our theory of truth and our metrics are a more powerful way to evaluate NLP performance over traditional unweighted metrics like precision and recall, because they allow us to ac-count for the rather obvious fact that some sentences express the target relations more clearly than others.

LanguageEnglish
Pages7-19
Number of pages13
JournalCEUR Workshop Proceedings
Volume1467
Publication statusPublished - 2015

Fingerprint

Costs

Cite this

@article{93c08030e5f844359f953194afbb6b23,
title = "CrowdTruth measures for language ambiguity the case of medical relation extraction",
abstract = "A widespread use of linked data for information extraction is distant supervision, in which relation tuples from a data source are found in sentences in a text corpus, and those sentences are treated as training data for relation extraction systems. Distant supervision is a cheap way to acquire training data, but that data can be quite noisy, which limits the performance of a system trained with it. Human annotators can be used to clean the data, but in some domains, such as medical NLP, it is widely believed that only medical experts can do this reliably. We have been investigating the use of crowdsourcing as an affordable alternative to using experts to clean noisy data, and have found that with the proper analysis, crowds can rival and even out-perform the precision and recall of experts, at a much lower cost. We have further found that the crowd, by virtue of its diversity, can help us find evidence of ambiguous sen-tences that are difficult to classify, and we have hypothesized that such sentences are likely just as difficult for machines to classify. In this pa-per we outline CrowdTruth, a previously presented method for scoring ambiguous sentences that suggests that existing modes of truth are in-adequate, and we present for the first time a set of weighted metrics for evaluating the performance of experts, the crowd, and a trained classiffier in light of ambiguity. We show that our theory of truth and our metrics are a more powerful way to evaluate NLP performance over traditional unweighted metrics like precision and recall, because they allow us to ac-count for the rather obvious fact that some sentences express the target relations more clearly than others.",
author = "Anca Dumitrache and Lora Aroyo and C.A. Welty",
year = "2015",
language = "English",
volume = "1467",
pages = "7--19",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR Workshop Proceedings",

}

CrowdTruth measures for language ambiguity the case of medical relation extraction. / Dumitrache, Anca; Aroyo, Lora; Welty, C.A.

In: CEUR Workshop Proceedings, Vol. 1467, 2015, p. 7-19.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - CrowdTruth measures for language ambiguity the case of medical relation extraction

AU - Dumitrache, Anca

AU - Aroyo, Lora

AU - Welty, C.A.

PY - 2015

Y1 - 2015

N2 - A widespread use of linked data for information extraction is distant supervision, in which relation tuples from a data source are found in sentences in a text corpus, and those sentences are treated as training data for relation extraction systems. Distant supervision is a cheap way to acquire training data, but that data can be quite noisy, which limits the performance of a system trained with it. Human annotators can be used to clean the data, but in some domains, such as medical NLP, it is widely believed that only medical experts can do this reliably. We have been investigating the use of crowdsourcing as an affordable alternative to using experts to clean noisy data, and have found that with the proper analysis, crowds can rival and even out-perform the precision and recall of experts, at a much lower cost. We have further found that the crowd, by virtue of its diversity, can help us find evidence of ambiguous sen-tences that are difficult to classify, and we have hypothesized that such sentences are likely just as difficult for machines to classify. In this pa-per we outline CrowdTruth, a previously presented method for scoring ambiguous sentences that suggests that existing modes of truth are in-adequate, and we present for the first time a set of weighted metrics for evaluating the performance of experts, the crowd, and a trained classiffier in light of ambiguity. We show that our theory of truth and our metrics are a more powerful way to evaluate NLP performance over traditional unweighted metrics like precision and recall, because they allow us to ac-count for the rather obvious fact that some sentences express the target relations more clearly than others.

AB - A widespread use of linked data for information extraction is distant supervision, in which relation tuples from a data source are found in sentences in a text corpus, and those sentences are treated as training data for relation extraction systems. Distant supervision is a cheap way to acquire training data, but that data can be quite noisy, which limits the performance of a system trained with it. Human annotators can be used to clean the data, but in some domains, such as medical NLP, it is widely believed that only medical experts can do this reliably. We have been investigating the use of crowdsourcing as an affordable alternative to using experts to clean noisy data, and have found that with the proper analysis, crowds can rival and even out-perform the precision and recall of experts, at a much lower cost. We have further found that the crowd, by virtue of its diversity, can help us find evidence of ambiguous sen-tences that are difficult to classify, and we have hypothesized that such sentences are likely just as difficult for machines to classify. In this pa-per we outline CrowdTruth, a previously presented method for scoring ambiguous sentences that suggests that existing modes of truth are in-adequate, and we present for the first time a set of weighted metrics for evaluating the performance of experts, the crowd, and a trained classiffier in light of ambiguity. We show that our theory of truth and our metrics are a more powerful way to evaluate NLP performance over traditional unweighted metrics like precision and recall, because they allow us to ac-count for the rather obvious fact that some sentences express the target relations more clearly than others.

UR - http://www.scopus.com/inward/record.url?scp=84949809058&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949809058&partnerID=8YFLogxK

M3 - Article

VL - 1467

SP - 7

EP - 19

JO - CEUR Workshop Proceedings

T2 - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -