Extracting N-ary Facts from Wikipedia Table Clusters

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

43 Downloads (Pure)

Abstract

Tables in Wikipedia articles contain a wealth of knowledge that would be useful for many applications if it were structured in a more coherent, queryable form. An important problem is that many of such tables contain the same type of knowledge, but have different layouts and/or schemata. Moreover, some tables refer to entities that we can link to Knowledge Bases (KBs), while others do not. Finally, some tables express entity-attribute relations, while others contain more complex n-ary relations. We propose a novel knowledge extraction technique that tackles these problems. Our method first transforms and clusters similar tables into fewer unified ones to overcome the problem of table diversity. Then, the unified tables are linked to the KB so that knowledge about popular entities propagates to the unpopular ones. Finally, our method applies a technique that relies on functional dependencies to judiciously interpret the table and extract n-ary relations. Our experiments over 1.5M Wikipedia tables show that our clustering can group many semantically similar tables. This leads to the extraction of many novel n-ary relations.

Original languageEnglish
Title of host publicationCIKM '20
Subtitle of host publicationProceedings of the 29th ACM International Conference on Information & Knowledge Management
PublisherAssociation for Computing Machinery
Pages655-664
Number of pages10
ISBN (Electronic)9781450368599
DOIs
Publication statusPublished - Oct 2020
Event29th ACM International Conference on Information and Knowledge Management, CIKM 2020 - Virtual, Online, Ireland
Duration: 19 Oct 202023 Oct 2020

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Country/TerritoryIreland
CityVirtual, Online
Period19/10/2023/10/20

Bibliographical note

Publisher Copyright:
© 2020 ACM.

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Keywords

  • data integration
  • knowledge extraction
  • n-ary relation
  • wikipedia tables

Fingerprint

Dive into the research topics of 'Extracting N-ary Facts from Wikipedia Table Clusters'. Together they form a unique fingerprint.

Cite this