TY - GEN
T1 - Closer Reading of RDF Generated by NLP on Wikipedia Biography
T2 - 17th Research Conference on Metadata and Semantic Research, MTSR 2023
AU - Sugimoto, Go
AU - Daza, Angel
AU - de Boer, Victor
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Although Wikidata and DBpedia are closely related to Wikipedia, they often hold a small subset of its semantic information, due to the specific scopes and methodologies chosen for their Linked Data (LD) construction. When we look at biographies, a large amount of RDF statements focus on the person’s core facts, and many semantic narratives are not included. To fill this knowledge gap, this paper seeks a solution with Natural Language Processing (NLP). We aim to assess to what extent out-of-the-box NLP tools can generate new LD from the biographical articles in Wikipedia. Unlike other NLP research, we put more emphasis on the qualitative analysis of NLP outputs (“close reading”) than the statistical performance of NLP algorithms (“distant reading”). We evaluate the overlaps and gaps between Wikipedia, Wikidata and DBpedia, as well as other biographical ontologies. We also analyze the triple patterns from the NLP results in comparison with the RDF entity (instance) and ontologies. Our research revealed that we are able to capture new information about the entity that Wikidata and DBpedia do not hold. At the same time, some noise could not be easily eliminated. Our method presented a bottom-up approach to biographical ontology designing. We also briefly propose a possible solution for future work.
AB - Although Wikidata and DBpedia are closely related to Wikipedia, they often hold a small subset of its semantic information, due to the specific scopes and methodologies chosen for their Linked Data (LD) construction. When we look at biographies, a large amount of RDF statements focus on the person’s core facts, and many semantic narratives are not included. To fill this knowledge gap, this paper seeks a solution with Natural Language Processing (NLP). We aim to assess to what extent out-of-the-box NLP tools can generate new LD from the biographical articles in Wikipedia. Unlike other NLP research, we put more emphasis on the qualitative analysis of NLP outputs (“close reading”) than the statistical performance of NLP algorithms (“distant reading”). We evaluate the overlaps and gaps between Wikipedia, Wikidata and DBpedia, as well as other biographical ontologies. We also analyze the triple patterns from the NLP results in comparison with the RDF entity (instance) and ontologies. Our research revealed that we are able to capture new information about the entity that Wikidata and DBpedia do not hold. At the same time, some noise could not be easily eliminated. Our method presented a bottom-up approach to biographical ontology designing. We also briefly propose a possible solution for future work.
KW - Biographies
KW - DBpedia
KW - Henry VIII
KW - Knowledge Graph
KW - Linked Data
KW - Natural Language Processing
KW - Wikidata
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=85200981760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85200981760&partnerID=8YFLogxK
UR - https://link.springer.com/book/10.1007/978-3-031-65990-4
U2 - 10.1007/978-3-031-65990-4_4
DO - 10.1007/978-3-031-65990-4_4
M3 - Conference contribution
AN - SCOPUS:85200981760
SN - 9783031659898
T3 - Communications in Computer and Information Science
SP - 41
EP - 54
BT - Metadata and Semantic Research
A2 - Garoufallou, Emmanouel
A2 - Sartori, Fabio
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 25 October 2023 through 27 October 2023
ER -