Systematic Study of Long Tail Phenomena in Entity Linking

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

State-of-the-art entity linkers achieve high accuracy scores with probabilistic methods. However,
these scores should be considered in relation to the properties of the datasets they are evaluated
on. Until now, there has not been a systematic investigation of the properties of entity linking
datasets and their impact on system performance. In this paper we report on a series of hypotheses
regarding the long tail phenomena in entity linking datasets, their interaction, and their impact
on system performance. Our systematic study of these hypotheses shows that evaluation datasets
mainly capture head entities and only incidentally cover data from the tail, thus encouraging
systems to overfit to popular/frequent and non-ambiguous cases. We find the most difficult cases
of entity linking among the infrequent candidates of ambiguous forms. With our findings, we
hope to inspire future designs of both entity linking systems and evaluation datasets. To support
this goal, we provide a list of recommended actions for better inclusion of tail cases.
Original languageEnglish
Title of host publicationProceedings of the the International Conference on Computational Linguistics (COLING 2018)
PublisherInternational Conference on Computational Linguistics (COLING)
Pages664-674
Number of pages11
ISBN (Print)9781948087506
Publication statusPublished - 2018
Event27th International Conference on Computational Linguistics COLING 2018 - Santa Fe, NM
Duration: 20 Aug 201826 Aug 2018
Conference number: 27

Conference

Conference27th International Conference on Computational Linguistics COLING 2018
Abbreviated titleCOLING 2018
CitySanta Fe, NM
Period20/08/1826/08/18

Cite this

Ilievski, F., Vossen, P. T. J. M., & Schlobach, S. (2018). Systematic Study of Long Tail Phenomena in Entity Linking. In Proceedings of the the International Conference on Computational Linguistics (COLING 2018) (pp. 664-674). [C18-1056] International Conference on Computational Linguistics (COLING).
Ilievski, F. ; Vossen, P.T.J.M. ; Schlobach, Stefan. / Systematic Study of Long Tail Phenomena in Entity Linking. Proceedings of the the International Conference on Computational Linguistics (COLING 2018). International Conference on Computational Linguistics (COLING), 2018. pp. 664-674
@inproceedings{2faebe2c131048e2bb744e5af2f6fc7c,
title = "Systematic Study of Long Tail Phenomena in Entity Linking",
abstract = "State-of-the-art entity linkers achieve high accuracy scores with probabilistic methods. However,these scores should be considered in relation to the properties of the datasets they are evaluatedon. Until now, there has not been a systematic investigation of the properties of entity linkingdatasets and their impact on system performance. In this paper we report on a series of hypothesesregarding the long tail phenomena in entity linking datasets, their interaction, and their impacton system performance. Our systematic study of these hypotheses shows that evaluation datasetsmainly capture head entities and only incidentally cover data from the tail, thus encouragingsystems to overfit to popular/frequent and non-ambiguous cases. We find the most difficult casesof entity linking among the infrequent candidates of ambiguous forms. With our findings, wehope to inspire future designs of both entity linking systems and evaluation datasets. To supportthis goal, we provide a list of recommended actions for better inclusion of tail cases.",
author = "F. Ilievski and P.T.J.M. Vossen and Stefan Schlobach",
year = "2018",
language = "English",
isbn = "9781948087506",
pages = "664--674",
booktitle = "Proceedings of the the International Conference on Computational Linguistics (COLING 2018)",
publisher = "International Conference on Computational Linguistics (COLING)",

}

Ilievski, F, Vossen, PTJM & Schlobach, S 2018, Systematic Study of Long Tail Phenomena in Entity Linking. in Proceedings of the the International Conference on Computational Linguistics (COLING 2018)., C18-1056, International Conference on Computational Linguistics (COLING), pp. 664-674, 27th International Conference on Computational Linguistics COLING 2018, Santa Fe, NM, 20/08/18.

Systematic Study of Long Tail Phenomena in Entity Linking. / Ilievski, F.; Vossen, P.T.J.M.; Schlobach, Stefan.

Proceedings of the the International Conference on Computational Linguistics (COLING 2018). International Conference on Computational Linguistics (COLING), 2018. p. 664-674 C18-1056.

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Systematic Study of Long Tail Phenomena in Entity Linking

AU - Ilievski, F.

AU - Vossen, P.T.J.M.

AU - Schlobach, Stefan

PY - 2018

Y1 - 2018

N2 - State-of-the-art entity linkers achieve high accuracy scores with probabilistic methods. However,these scores should be considered in relation to the properties of the datasets they are evaluatedon. Until now, there has not been a systematic investigation of the properties of entity linkingdatasets and their impact on system performance. In this paper we report on a series of hypothesesregarding the long tail phenomena in entity linking datasets, their interaction, and their impacton system performance. Our systematic study of these hypotheses shows that evaluation datasetsmainly capture head entities and only incidentally cover data from the tail, thus encouragingsystems to overfit to popular/frequent and non-ambiguous cases. We find the most difficult casesof entity linking among the infrequent candidates of ambiguous forms. With our findings, wehope to inspire future designs of both entity linking systems and evaluation datasets. To supportthis goal, we provide a list of recommended actions for better inclusion of tail cases.

AB - State-of-the-art entity linkers achieve high accuracy scores with probabilistic methods. However,these scores should be considered in relation to the properties of the datasets they are evaluatedon. Until now, there has not been a systematic investigation of the properties of entity linkingdatasets and their impact on system performance. In this paper we report on a series of hypothesesregarding the long tail phenomena in entity linking datasets, their interaction, and their impacton system performance. Our systematic study of these hypotheses shows that evaluation datasetsmainly capture head entities and only incidentally cover data from the tail, thus encouragingsystems to overfit to popular/frequent and non-ambiguous cases. We find the most difficult casesof entity linking among the infrequent candidates of ambiguous forms. With our findings, wehope to inspire future designs of both entity linking systems and evaluation datasets. To supportthis goal, we provide a list of recommended actions for better inclusion of tail cases.

UR - https://aclanthology.info/events/coling-2018

M3 - Conference contribution

SN - 9781948087506

SP - 664

EP - 674

BT - Proceedings of the the International Conference on Computational Linguistics (COLING 2018)

PB - International Conference on Computational Linguistics (COLING)

ER -

Ilievski F, Vossen PTJM, Schlobach S. Systematic Study of Long Tail Phenomena in Entity Linking. In Proceedings of the the International Conference on Computational Linguistics (COLING 2018). International Conference on Computational Linguistics (COLING). 2018. p. 664-674. C18-1056