García-Martínez, M., Aransa, W., Bougares, F. et al. (1 more author) (2020) Addressing data sparsity for neural machine translation between morphologically rich languages. Machine Translation, 34 (1). pp. 1-20. ISSN 0922-6567
Abstract
Translating between morphologically rich languages is still challenging for current machine translation systems. In this paper, we experiment with various neural machine translation (NMT) architectures to address the data sparsity problem caused by data availability (quantity), domain shift and the languages involved (Arabic and French). We show that the Factored NMT (FNMT) model, which uses linguistically motivated factors, is able to outperform standard NMT systems using subword units by more than 1 BLEU point even when a large quantity of data is available. Our work shows the benefits of applying linguistic factors in NMT when faced with low- and high-resource conditions.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 Springer Nature B.V. This is an author-produced version of a paper subsequently published in Machine Translation. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Neural machine translation; Factored models; Deep learning |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 27 Mar 2020 14:28 |
Last Modified: | 02 Dec 2021 11:55 |
Status: | Published |
Publisher: | Springer Science and Business Media LLC |
Refereed: | Yes |
Identification Number: | 10.1007/s10590-019-09242-9 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:158785 |