Derczynski, L., Maynard, D., Rizzo, G. et al. (5 more authors) (2014) Analysis of named entity recognition and linking for tweets. Information Processing & Management, 51 (2). 32 - 49. ISSN 0306-4573
Abstract
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © Year 2015 Published by Elsevier Ltd. This is an author produced version of a paper subsequently published in Information Processing & Management. Uploaded in accordance with the publisher's self-archiving policy. Article available under the terms of the CC-BY-NC-ND licence (https://creativecommons.org/licenses/by-nc-nd/4.0/) |
Keywords: | Information extraction; Named entity recognition; Entity disambiguation; Microblogs; Twitter |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Dec 2015 15:57 |
Last Modified: | 19 Nov 2017 01:38 |
Published Version: | http://dx.doi.org/10.1016/j.ipm.2014.10.006 |
Status: | Published |
Publisher: | Elsevier |
Refereed: | Yes |
Identification Number: | 10.1016/j.ipm.2014.10.006 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:92764 |