Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.
LanguageEnglish
Title of host publicationLREC2018, Myazaki
StatePublished - 2018

Cite this

@inproceedings{4ad527e286804cd1aeef15921662a614,
title = "Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data",
abstract = "In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.",
keywords = "ulm1, ulm4",
author = "Piek Vossen and Filip Ilievski and Marten Postma and Segers Roxane",
year = "2018",
language = "English",
booktitle = "LREC2018, Myazaki",

}

Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data. / Vossen, Piek; Ilievski, Filip; Postma, Marten; Roxane, Segers.

LREC2018, Myazaki. 2018.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data

AU - Vossen,Piek

AU - Ilievski,Filip

AU - Postma,Marten

AU - Roxane,Segers

PY - 2018

Y1 - 2018

N2 - In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.

AB - In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.

KW - ulm1, ulm4

M3 - Conference contribution

BT - LREC2018, Myazaki

ER -