Onyenwe, I.E. and Hepple, M. orcid.org/0000-0003-1488-257X (2016) Predicting Morphologically-Complex Unknown Words in Igbo. In: Sojka, P., Horák, A., Kopeček, I. and Pala, K., (eds.) Text, Speech, and Dialogue. Text, Speech, and Dialogue (TSD 2016), 12-16 Sep 2016, Brno, CzechRepublic. Lecture Notes in Computer Science, 9924 . Springer International Publishing , pp. 206-214. ISBN 978-3-319-45510-5
Abstract
The effective handling of previously unseen words is an important factor in the performance of part-of-speech taggers. Some trainable POS taggers use suffix (sometimes prefix) strings as cues in handling unknown words (in effect serving as a proxy for actual linguistic affixes). In the context of creating a tagger for the African language Igbo, we compare the performance of some existing taggers, implementing such an approach, to a novel method for handling morphologically complex unknown words, based on morphological reconstruction (i.e. a linguistically-informed segmentation into root and affixes). The novel method outperforms these other systems by several percentage points, achieving accuracies of around 92 % on morphologically-complex unknown words.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2016 Springer International Publishing. This is an author produced version of a paper subsequently published in Text, Speech (Lecture Notes in Computer Science). Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Morphology; Morphological reconstruction; Igbo; Unknown words prediction; Part-of-speech tagging |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 16 Jun 2017 14:01 |
Last Modified: | 19 Dec 2022 13:36 |
Published Version: | http://dx.doi.org/10.1007/978-3-319-45510-5_24 |
Status: | Published |
Publisher: | Springer International Publishing |
Series Name: | Lecture Notes in Computer Science |
Refereed: | Yes |
Identification Number: | 10.1007/978-3-319-45510-5_24 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:117819 |