TY - JOUR
T1 - Analysis of named entity recognition and linking for tweets
AU - Derczynski, L.
AU - Maynard, D.
AU - Rizzo, G.
AU - van Erp, M.G.J.
AU - Gorrell, G.
AU - Troncy, R.
AU - Petrak, J.
AU - Bontcheva, K.
PY - 2015
Y1 - 2015
N2 - Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.
AB - Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.
U2 - 10.1016/j.ipm.2014.10.006
DO - 10.1016/j.ipm.2014.10.006
M3 - Article
SN - 0306-4573
VL - 51
SP - 32
EP - 49
JO - Information Processing and Management
JF - Information Processing and Management
IS - 2
ER -