Onyenwe, I., Hepple, M. orcid.org/0000-0003-1488-257X, Uchechukwu, C. et al. (1 more author) (2015) Use of Transformation-Based Learning in Annotation Pipeline of Igbo, an African Language. In: Nakov, P., Zampieri, M., Osenova , P., Tan, L., Vertan, C., Ljubešić , N. and Tiedemann , J., (eds.) Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects. Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, September 10, 2015, Hissar, Bulgaria . Association for Computational Linguistics , pp. 24-33.
Abstract
The accuracy of an annotated corpus can be increased through evaluation and re- vision of the annotation scheme, and through adjudication of the disagreements found. In this paper, we describe a novel process that has been applied to improve a part-of-speech (POS) tagged corpus for the African language Igbo. An inter-annotation agreement (IAA) exercise was undertaken to iteratively revise the tagset used in the creation of the initial tagged corpus, with the aim of refining the tagset and maximizing annotator performance. The tagset revisions and other corrections were efficiently propagated to the overall corpus in a semi-automated manner using transformation-based learning (TBL) to identify candidates for cor- rection and to propose possible tag corrections. The affected word-tag pairs in the corpus were inspected to ensure a high quality end-product with an accuracy that would not be achieved through a purely automated process. The results show that the tagging accuracy increases from 88% to 94%. The tagged corpus is potentially re-usable for other dialects of the language.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2015 Association for Computational Linguistics. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License (https://creativecommons.org/licenses/by-nc-sa/3.0/). Permission is granted to make copies for the purposes of teaching and research. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 17 Jul 2017 09:36 |
Last Modified: | 17 Jul 2017 09:36 |
Published Version: | http://aclweb.org/anthology/W15-5405 |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:117792 |