Ezeani, I., Hepple, M.R. orcid.org/0000-0003-1488-257X and Onyenwe, I. (2017) Lexical Disambiguation of Igbo using Diacritic Restoration. In: Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications. Conference of the European Chapter of the Association for Computational Linguistics, 03-07 Apr 2017, Valencia, Spain. Association for Computational Linguistics , pp. 53-60.
Abstract
Properly written texts in Igbo, a low resource African language, are rich in both orthographic and tonal diacritics. Diacritics are essential in capturing the distinctions in pronunciation and meaning of words, as well as in lexical disambiguation. Unfortunately, most electronic texts in diacritic languages are written without diacritics. This makes diacritic restoration a necessary step in corpus building and language processing tasks for languages with diacritics. In our previous work, we built some n−gram models with simple smoothing techniques based on a closedworld assumption. However, as a classi- fication task, diacritic restoration is well suited for and will be more generalisable with machine learning. This paper, therefore, presents a more standard approach to dealing with the task which involves the application of machine learning algorithms.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2017 Association for Computational Linguistics. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 License. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 21 Sep 2017 15:03 |
Last Modified: | 21 Sep 2017 15:04 |
Published Version: | http://aclweb.org/anthology/W17-1907 |
Status: | Published online |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Identification Number: | 10.18653/v1/W17-1907 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:121255 |