Ezeani, I., Hepple, M. orcid.org/0000-0003-1488-257X and Onyenwe, I. (2016) Automatic Restoration of Diacritics for Igbo Language. In: Sojka, P., Horák, A., Kopeček, I. and Pala, K., (eds.) Text, Speech, and Dialogue. Text, Speech, and Dialogue (TSD 2016), 12-16 Sep 2016, Brno, CzechRepublic. Lecture Notes in Computer Science, 9924 . Springer International Publishing , pp. 198-205. ISBN 978-3-319-45510-5
Abstract
Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distinctions between words that are important for both meaning and pronunciation, and hence of potential value for a range of language processing tasks. Such diacritics, however, are often largely absent from the electronic texts we might want to process, or assemble into corpora, and so the need arises for effective methods for automatic diacritic restoration for Igbo. In this paper, we experiment using an Igbo bible corpus, which is extensively marked for vowel distinctions, and partially for tonal distinctions, and attempt the task of reinstating these diacritics when they have been deleted. We investigate a number of word-level diacritic restoration methods, based on n-grams, under a closed-world assumption, achieving an accuracy of 98.83 % with our most effective method.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2016 Springer International Publishing. This is an author produced version of a paper subsequently published in Text, Speech (Lecture Notes in Computer Science). Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Diacritic restoration; Sense disambiguation; Low resourced languages; Igbo language |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 16 Jun 2017 12:08 |
Last Modified: | 19 Dec 2022 13:36 |
Published Version: | http://dx.doi.org/10.1007/978-3-319-45510-5_23 |
Status: | Published |
Publisher: | Springer International Publishing |
Series Name: | Lecture Notes in Computer Science |
Refereed: | Yes |
Identification Number: | 10.1007/978-3-319-45510-5_23 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:117833 |