Derczynski, L., Maynard, D., Rizzo, G. et al. (5 more authors) (2014) Analysis of named entity recognition and linking for tweets. Information Processing & Management, 51 (2). 32 - 49. ISSN 0306-4573
Abstract
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.
Metadata
Authors/Creators: |
|
---|---|
Copyright, Publisher and Additional Information: | © Year 2015 Published by Elsevier Ltd. This is an author produced version of a paper subsequently published in Information Processing & Management. Uploaded in accordance with the publisher's self-archiving policy. Article available under the terms of the CC-BY-NC-ND licence (https://creativecommons.org/licenses/by-nc-nd/4.0/) |
Keywords: | Information extraction; Named entity recognition; Entity disambiguation; Microblogs; Twitter |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Dec 2015 15:57 |
Last Modified: | 19 Nov 2017 01:38 |
Published Version: | http://dx.doi.org/10.1016/j.ipm.2014.10.006 |
Status: | Published |
Publisher: | Elsevier |
Refereed: | Yes |
Identification Number: | https://doi.org/10.1016/j.ipm.2014.10.006 |