Babych, B orcid.org/0000-0003-1872-1677 (2019) Unsupervised Induction of Ukrainian Morphological Paradigms for the New Lexicon: Extending Coverage for Named Entities and Neologisms using Inflection Tables and Unannotated Corpora. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing. BSNLP 2019: 7th Workshop on Balto-Slavic Natural Language, 02 Aug 2019, Florence, Italy. Association for Computational Linguistics (ACL) , pp. 1-11. ISBN 978-1-950737-41-3
Abstract
The paper presents an unsupervised method for quickly extending a Ukrainian lexicon by generating paradigms and morphological feature structures for new Named Entities and neologisms, which are not covered by existing static morphological resources. This approach addresses a practical problem of modelling paradigms for entities created by the dynamic processes in the lexicon: this problem is especially serious for highly-inflected languages in domains with specialised or quickly changing lexicon. The method uses an unannotated Ukrainian corpus and a small fixed set of inflection tables, which can be found in traditional grammar textbooks. The advantage of the proposed approach is that updating the morphological lexicon does not require training or linguistic annotation, allowing fast knowledge-light extension of an existing static lexicon to improve morphological coverage on a specific corpus. The method is implemented in an open-source package on a GitHub repository. It can be applied to other low-resourced inflectional languages which have internet corpora and linguistic descriptions of their inflection system, following the example of inflection tables for Ukrainian. Evaluation results shows consistent improvements in coverage for Ukrainian corpora of different corpus types.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2019 Association for Computational Linguistics. This is an open access article under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures & Societies (Leeds) > Translation Studies (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 02 Sep 2019 13:24 |
Last Modified: | 02 Sep 2019 13:24 |
Published Version: | https://aclweb.org/anthology/volumes/W19-37/ |
Status: | Published |
Publisher: | Association for Computational Linguistics (ACL) |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:150235 |