Semantic overfitting: what `world' do we consider when evaluating disambiguation of text?

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Semantic text processing faces the challenge of defining the relation between lexical expressions
and the world to which they make reference within a period of time. It is unclear whether the
current test sets used to evaluate disambiguation tasks are representative for the full complexity
considering this time-anchored relation, resulting in semantic overfitting to a specific period and
the frequent phenomena within. We conceptualize and formalize a set of metrics which eval-
uate this complexity of datasets. We provide evidence for their applicability on five different
disambiguation tasks. To challenge semantic overfitting of disambiguation systems, we propose
a time-based, metric-aware method for developing datasets in a systematic and semi-automated
manner, as well as an event-based QA task
Original languageEnglish
Title of host publicationProceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Pages1180-1191
Number of pages12
Publication statusPublished - 2016

Fingerprint

Dive into the research topics of 'Semantic overfitting: what `world' do we consider when evaluating disambiguation of text?'. Together they form a unique fingerprint.

Cite this