Skip to main navigation Skip to search Skip to main content

The role of knowledge in determining identity of long-tail entities

Research output: Contribution to JournalArticleAcademicpeer-review

382 Downloads (Pure)

Abstract

The NIL entities do not have an accessible representation, which means that their identity cannot be established through traditional disambiguation. Consequently, they have received little attention in entity linking systems and tasks so far. Given the non-redundancy of knowledge on NIL entities, the lack of frequency priors, their potentially extreme ambiguity, and numerousness, they form an extreme class of long-tail entities and pose a great challenge for state-of-the-art systems. In this paper, we investigate the role of knowledge when establishing the identity of NIL entities mentioned in text. What kind of knowledge can be applied to establish the identity of NILs? Can we potentially link to them at a later point? How to capture implicit knowledge and fill knowledge gaps in communication? We formulate and test hypotheses to provide insights to these questions. Due to the unavailability of instance-level knowledge, we propose to enrich the locally extracted information with profiling models that rely on background knowledge in Wikidata. We describe and implement two profiling machines based on state-of-the-art neural models. We evaluate their intrinsic behavior and their impact on the task of determining identity of NIL entities.

Original languageEnglish
Article number100565
Pages (from-to)1-18
Number of pages18
JournalJournal of Web Semantics
Volume61-62
DOIs
Publication statusPublished - Mar 2020

Funding

The research reported in this paper has been funded by the Netherlands Organisation for Scientific Research (NWO) via the Spinoza fund.

Keywords

  • Knowledge-based completion
  • Long-tail entities
  • NIL clustering

Fingerprint

Dive into the research topics of 'The role of knowledge in determining identity of long-tail entities'. Together they form a unique fingerprint.

Cite this