Atwell, ES, Demetriou, G, Hughes, J et al. (2 more authors) (2000) Comparing linguistic interpretation schemes for English corpora. In: Proceedings of the COLING-2000 Workshop on Linguistically Interpreted Corpora. COLING LINC-2000 Workshop on Linguistically Interpreted Corpora, 06 Aug 2000, University of Luxembourg, Luxembourg. Association for Computational Linguistics , 1 - 10.
Abstract
Project AMALGAM explored a range of Partof-Speech tagsets and phrase structure parsing schemes used in modern English corpus-based research. The PoS-tagging schemes and parsing schemes include some which have been used for hand annotation of corpora or manual postediting of automatic taggers or parsers; and others which are unedited output of a parsing program. Project deliverables include: a detailed description of each PoS-tagging scheme, and multi-tagged corpus; a “Corpus-neutral ” tokenization scheme; a family of PoS-taggers, for 8 PoS-tagsets; a method for “PoS-tagset conversion”, a sample of texts parsed according to a range of parsing schemes: a MultiTreebank; an Internet service allowing researchers worldwide free access to the above resources, including a simple email-based method for PoS-tagging any English text with any or all PoS-tagset(s). We conclude that the range of tagging and parsing schemes in use is too varied to allow agreement on a standard; and that parserevaluation based on ‘bracket-matching ’ is unfair to more sophisticated parsers.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | (c) ACL, 2000. Reproduced with permission from the publisher. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 13 Jan 2015 12:08 |
Last Modified: | 02 May 2015 01:41 |
Published Version: | http://www.aclweb.org/anthology/W00-1901 |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82292 |