Atwell, ES (2007) A cross-language methodology for corpus part-of-speech tag-set development. In: Proceedings of the CL'2007 Corpus Linguistics Conference. CL'2007 Corpus Linguistics Conference, 27-30 Jul 2007, University of Birmingham, UK. UCREL, Lancaster University
Abstract
This paper examines criteria used in development of Corpus Part-of-Speech tag sets used when PoS-tagging a corpus, that is, enriching a corpus by adding a part-of-speech category label to each word. This requires a tag-set, a list of grammatical category labels; a tagging scheme, practical definitions of each tag or label, showing words and contexts where each tag applies; and a tagger, a program for assigning a tag to each word in the corpus, implementing the tag-set and tagging-scheme in a tag-assignment algorithm.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | (c) 2007, UCREL. Reproduced in accordance with the publisher's self-archiving policy. |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
| Depositing User: | Symplectic Publications |
| Date Deposited: | 16 Jan 2015 11:48 |
| Last Modified: | 19 Dec 2022 13:30 |
| Published Version: | http://ucrel.lancs.ac.uk/publications/CL2007/paper... |
| Status: | Published |
| Publisher: | UCREL, Lancaster University |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82300 |
CORE (COnnecting REpositories)
CORE (COnnecting REpositories)