Hughes, J, Souter, DC and Atwell, ES (1995) Automatic extraction of tagset mappings from parallel annotated corpora. In: Tzoukermann, E and Armstrong, S, (eds.) Proceedings of the ACL-SIGDAT Workshop From Text to Tags: Issues in Multilingual Language Analysis. ACL-SIGDAT Workshop From Text to Tags: Issues in Multilingual Language Analysis, 27 March 1995, Dublin, Ireland. Association for Computational Linguistics , 10 - 17.
Abstract
Several research projects around the world are building grammatically analysed corpora; that is, collections of text annotated with part-of-speech wordtags and syntax trees. However, projects have used quite different wordtagging and parsing schemes. Developers of corpora adhere to a variety of competing models or theories of grammar and parsing, with the effect of restricting the accessibility of their respective corpora, and the potential for collation into a single fully parsed corpus. In view of this heterogeneity, we have begun to investigate and develop methods of automatically mapping between the annotation schemes of the most widely known corpora, thus assessing their differences and improving their reusability. Annotating a single corpus with the different schemes allows for comparisons and will provide a rich test-bed for automatic parsers. Collation of all the included corpora into a single large annotated corpus will provide a more detailed language model to be developed for tasks such as speech and handwriting recognition. This paper focuses on methods of developing mappings between tagsets and, in particular, the method of automatic extraction of mappings from corpora tagged with more than one annotation scheme.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | (c) 1995, Association for Computational Linguistics. Reproduced with permission from the publisher. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 20 Jan 2015 11:25 |
Last Modified: | 08 May 2015 12:52 |
Published Version: | http://aclweb.org/anthology/ |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82967 |