Boito, M.Z., Villavicencio, A. orcid.org/0000-0002-3731-9168 and Besacier, L. (2021) Investigating alignment interpretability for low-resource NMT. Machine Translation, 34. pp. 305-323. ISSN 0922-6567
Abstract
The attention mechanism in Neural Machine Translation (NMT) models added flexibility to translation systems, and the possibility to visualize soft-alignments between source and target representations. While there is much debate about the relationship between attention and the yielded output for neural models (Jain and Wallace 2019; Serrano and Smith 2019; Wiegreffe and Pinter 2019; Vashishth et al. 2019), in this paper we propose a different assessment, investigating soft-alignment interpretability in low-resource scenarios. We experimented with different architectures (RNN (Bahdanau et al. 2015), 2D-CNN (Elbayad et al. 2018), and Transformer (Vaswani et al. 2017)), comparing them with regards to their ability to produce directly exploitable alignments. For evaluating exploitability, we replicated the Unsupervised Word Segmentation (UWS) task from Godard et al. (2018). There, source words are translated into unsegmented phone sequences. Posterior to training, the resulting soft-alignments are used for producing segmentation over the target side. Our results showed that a RNN-based NMT model produced the most exploitable alignments in this scenario. We then investigated methods for increasing its UWS scores by comparing the following methodologies: monolingual pre-training, input representation augmentation (hybrid model), and explicit word length optimization during training. We reached the best results by using the hybrid model, which uses an intermediate monolingual-rooted segmentation from a non-parametric Bayesian model (Goldwater 2007) to enrich the input representation before training.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © The Author(s), under exclusive licence to Springer Nature B.V. part of Springer Nature 2021. This is an author-produced version of a paper subsequently published in Machine Translation. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | low-resource languages; attention mechanism; sequence-to-sequence models; unsupervised word segmentation; computational language documentation; neural machine translation |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number Engineering and Physical Science Research Council EP/T02450X/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 19 Oct 2020 10:29 |
Last Modified: | 06 Feb 2022 01:38 |
Status: | Published |
Publisher: | Springer |
Refereed: | Yes |
Identification Number: | 10.1007/s10590-020-09254-w |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:166668 |