Derczynski, L., Augenstein, I. and Bontcheva, K. (2015) USFD: Twitter NER with Drift Compensation and Linked Data. In: Proceedings of the ACL 2015 Workshop on Noisy User-generated Text. ACL 2015 Workshop on Noisy User-generated Text (W-NUT), 31 July, 2015, Beijing, China. Association for Computational Linguistics , pp. 48-53.
Abstract
This paper describes a pilot NER system for Twitter, comprising the USFD system entry to the W-NUT 2015 NER shared task. The goal is to correctly label entities in a tweet dataset, using an inventory of ten types. We employ structured learning, drawing on gazetteers taken from Linked Data, and on unsupervised clustering features, and attempting to compensate for stylistic and topic drift - a key challenge in social media text. Our result is competitive; we provide an analysis of the components of our methodology, and an examination of the target dataset in the context of this task.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2015 Association for Computational Linguistics. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License (https://creativecommons.org/licenses/by-nc-sa/3.0/). Permission is granted to make copies for the purposes of teaching and research. ACL Anthology: http://www.aclweb.org/anthology/index.html |
Keywords: | cs.CL; cs.CL |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 29 Jan 2016 15:53 |
Last Modified: | 11 Feb 2021 15:20 |
Published Version: | http://www.aclweb.org/anthology/W/W15/W15-4306.pdf |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:92763 |