Abstract
Tables in Wikipedia articles contain a wealth of knowledge that would be useful for many applications if it were structured in a more coherent, queryable form. An important problem is that many of such tables contain the same type of knowledge, but have different layouts and/or schemata. Moreover, some tables refer to entities that we can link to Knowledge Bases (KBs), while others do not. Finally, some tables express entity-attribute relations, while others contain more complex n-ary relations. We propose a novel knowledge extraction technique that tackles these problems. Our method first transforms and clusters similar tables into fewer unified ones to overcome the problem of table diversity. Then, the unified tables are linked to the KB so that knowledge about popular entities propagates to the unpopular ones. Finally, our method applies a technique that relies on functional dependencies to judiciously interpret the table and extract n-ary relations. Our experiments over 1.5M Wikipedia tables show that our clustering can group many semantically similar tables. This leads to the extraction of many novel n-ary relations.
Original language | English |
---|---|
Title of host publication | CIKM '20 |
Subtitle of host publication | Proceedings of the 29th ACM International Conference on Information & Knowledge Management |
Publisher | Association for Computing Machinery |
Pages | 655-664 |
Number of pages | 10 |
ISBN (Electronic) | 9781450368599 |
DOIs | |
Publication status | Published - Oct 2020 |
Event | 29th ACM International Conference on Information and Knowledge Management, CIKM 2020 - Virtual, Online, Ireland Duration: 19 Oct 2020 → 23 Oct 2020 |
Publication series
Name | International Conference on Information and Knowledge Management, Proceedings |
---|
Conference
Conference | 29th ACM International Conference on Information and Knowledge Management, CIKM 2020 |
---|---|
Country/Territory | Ireland |
City | Virtual, Online |
Period | 19/10/20 → 23/10/20 |
Bibliographical note
Publisher Copyright:© 2020 ACM.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
Keywords
- data integration
- knowledge extraction
- n-ary relation
- wikipedia tables