Atwell, ES (2007) A cross-language methodology for corpus part-of-speech tag-set development. In: Proceedings of the CL'2007 Corpus Linguistics Conference. CL'2007 Corpus Linguistics Conference, 27-30 Jul 2007, University of Birmingham, UK. UCREL, Lancaster University
Abstract
This paper examines criteria used in development of Corpus Part-of-Speech tag sets used when PoS-tagging a corpus, that is, enriching a corpus by adding a part-of-speech category label to each word. This requires a tag-set, a list of grammatical category labels; a tagging scheme, practical definitions of each tag or label, showing words and contexts where each tag applies; and a tagger, a program for assigning a tag to each word in the corpus, implementing the tag-set and tagging-scheme in a tag-assignment algorithm.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | (c) 2007, UCREL. Reproduced in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 16 Jan 2015 11:48 |
Last Modified: | 19 Dec 2022 13:30 |
Published Version: | http://ucrel.lancs.ac.uk/publications/CL2007/paper... |
Status: | Published |
Publisher: | UCREL, Lancaster University |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82300 |