Evaluating named entity recognition tools for extracting social networks from novels

Niels Dekker, Tobias Kuhn, Marieke van Erp

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems.

Original languageEnglish
Article numbere189
Pages (from-to)1-29
Number of pages29
JournalPeerJ Computer Science
Volume2019
Issue number4
DOIs
Publication statusPublished - 18 Apr 2019

Fingerprint

Processing

Keywords

  • Classic and modern literature
  • Cultural AI
  • Digital humanities
  • Evaluation
  • Named entity recognition
  • Social networks

Cite this

@article{da057895fcad4a9c91479c6a814fb83b,
title = "Evaluating named entity recognition tools for extracting social networks from novels",
abstract = "The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems.",
keywords = "Classic and modern literature, Cultural AI, Digital humanities, Evaluation, Named entity recognition, Social networks",
author = "Niels Dekker and Tobias Kuhn and {van Erp}, Marieke",
year = "2019",
month = "4",
day = "18",
doi = "10.7717/peerj-cs.189",
language = "English",
volume = "2019",
pages = "1--29",
journal = "PeerJ Computer Science",
issn = "2376-5992",
publisher = "PeerJ",
number = "4",

}

Evaluating named entity recognition tools for extracting social networks from novels. / Dekker, Niels; Kuhn, Tobias; van Erp, Marieke.

In: PeerJ Computer Science, Vol. 2019, No. 4, e189, 18.04.2019, p. 1-29.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Evaluating named entity recognition tools for extracting social networks from novels

AU - Dekker, Niels

AU - Kuhn, Tobias

AU - van Erp, Marieke

PY - 2019/4/18

Y1 - 2019/4/18

N2 - The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems.

AB - The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems.

KW - Classic and modern literature

KW - Cultural AI

KW - Digital humanities

KW - Evaluation

KW - Named entity recognition

KW - Social networks

UR - http://www.scopus.com/inward/record.url?scp=85074134957&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074134957&partnerID=8YFLogxK

U2 - 10.7717/peerj-cs.189

DO - 10.7717/peerj-cs.189

M3 - Article

VL - 2019

SP - 1

EP - 29

JO - PeerJ Computer Science

JF - PeerJ Computer Science

SN - 2376-5992

IS - 4

M1 - e189

ER -