TY - GEN
T1 - Using Distributional Semantics for Automatic Taxonomy Induction
AU - Zafar, Bushra
AU - Cochez, Michael
AU - Qamar, Usman
PY - 2017/2/27
Y1 - 2017/2/27
N2 - Semantic taxonomies are powerful tools that provide structured knowledge to Natural Language Processing (NLP), Information Retreval (IR), and general Artificial Intelligence (AI) systems. These taxonomies are extensively used for solving knowledge rich problems such as textual entailment and question answering. In this paper, we present a taxonomy induction system and evaluate it using the benchmarks provided in the Taxonomy Extraction Evaluation (TExEval2) Task. The task is to identify hyponym-hypernym relations and to construct a taxonomy from a given domain specific list. Our approach is based on a word embedding, trained from a large corpus and string-matching approaches. The overall approach is semi-supervised. We propose a generic algorithm that utilizes the vectors from the embedding effectively, to identify hyponym-hypernym relations and to induce the taxonomy. The system generated taxonomies on English language for three different domains (environment, food and science) which are evaluated against gold standard taxonomies. The system achieved good results for hyponym-hypernym identification and taxonomy induction, especially when compared to other tools using similar background knowledge.
AB - Semantic taxonomies are powerful tools that provide structured knowledge to Natural Language Processing (NLP), Information Retreval (IR), and general Artificial Intelligence (AI) systems. These taxonomies are extensively used for solving knowledge rich problems such as textual entailment and question answering. In this paper, we present a taxonomy induction system and evaluate it using the benchmarks provided in the Taxonomy Extraction Evaluation (TExEval2) Task. The task is to identify hyponym-hypernym relations and to construct a taxonomy from a given domain specific list. Our approach is based on a word embedding, trained from a large corpus and string-matching approaches. The overall approach is semi-supervised. We propose a generic algorithm that utilizes the vectors from the embedding effectively, to identify hyponym-hypernym relations and to induce the taxonomy. The system generated taxonomies on English language for three different domains (environment, food and science) which are evaluated against gold standard taxonomies. The system achieved good results for hyponym-hypernym identification and taxonomy induction, especially when compared to other tools using similar background knowledge.
UR - http://www.scopus.com/inward/record.url?scp=85026878389&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026878389&partnerID=8YFLogxK
U2 - 10.1109/FIT.2016.070
DO - 10.1109/FIT.2016.070
M3 - Conference contribution
AN - SCOPUS:85026878389
T3 - Proceedings - 14th International Conference on Frontiers of Information Technology, FIT 2016
SP - 348
EP - 353
BT - Proceedings - 14th International Conference on Frontiers of Information Technology, FIT 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th International Conference on Frontiers of Information Technology, FIT 2016
Y2 - 19 December 2016 through 21 December 2016
ER -