Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.
LanguageEnglish
Title of host publication[Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Place of PublicationMiyazaki
PublisherLREC
Pages1-9
Number of pages9
ISBN (Electronic)9791095546009
StatePublished - 2018
EventEleventh International Conference on Language Resources and Evaluation (2018) - Myazaki, Japan
Duration: 7 May 201812 May 2018
Conference number: 11
http://lrec2018.lrec-conf.org/en/

Conference

ConferenceEleventh International Conference on Language Resources and Evaluation (2018)
Abbreviated titleLREC2018, Myazaki
CountryJapan
CityMyazaki
Period7/05/1812/05/18
Internet address

Fingerprint

Violence

Keywords

  • ulm1, ulm4

Cite this

Vossen, P., Ilievski, F., Postma, M., & Roxane, S. (2018). Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data. In [Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 1-9). Miyazaki: LREC.
Vossen, Piek ; Ilievski, Filip ; Postma, Marten ; Roxane, Segers. / Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data. [Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki : LREC, 2018. pp. 1-9
@inproceedings{4ad527e286804cd1aeef15921662a614,
title = "Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data",
abstract = "In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.",
keywords = "ulm1, ulm4",
author = "Piek Vossen and Filip Ilievski and Marten Postma and Segers Roxane",
year = "2018",
language = "English",
pages = "1--9",
booktitle = "[Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018)",
publisher = "LREC",

}

Vossen, P, Ilievski, F, Postma, M & Roxane, S 2018, Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data. in [Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018). LREC, Miyazaki, pp. 1-9, Eleventh International Conference on Language Resources and Evaluation (2018), Myazaki, Japan, 7/05/18.

Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data. / Vossen, Piek; Ilievski, Filip; Postma, Marten; Roxane, Segers.

[Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki : LREC, 2018. p. 1-9.

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data

AU - Vossen,Piek

AU - Ilievski,Filip

AU - Postma,Marten

AU - Roxane,Segers

PY - 2018

Y1 - 2018

N2 - In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.

AB - In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.

KW - ulm1, ulm4

UR - http://lrec2018.lrec-conf.org/en/

M3 - Conference contribution

SP - 1

EP - 9

BT - [Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

PB - LREC

CY - Miyazaki

ER -

Vossen P, Ilievski F, Postma M, Roxane S. Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data. In [Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki: LREC. 2018. p. 1-9.