Boito, M.Z., Bérard, A., Villavicencio, A. orcid.org/0000-0002-3731-9168 et al. (1 more author) (2018) Unwritten languages demand attention too! Word discovery with encoder-decoder models. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 16-20 Dec 2017, Okinawa, Japan. IEEE , pp. 458-465. ISBN 9781509047895
Abstract
Word discovery is the task of extracting words from un-segmented text. In this paper we examine to what extent neural networks can be applied to this task in a realistic unwritten language scenario, where only small corpora and limited annotations are available. We investigate two scenarios: one with no supervision and another with limited supervision with access to the most frequent words. Obtained results show that it is possible to retrieve at least 27% of the gold standard vocabulary by training an encoder-decoder neural machine translation system with only 5,157 sentences. This result is close to those obtained with a task-specific Bayesian nonparametric model. Moreover, our approach has the advantage of generating translation alignments, which could be used to create a bilingual lexicon. As a future perspective, this approach is also well suited to work directly from speech.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2017 IEEE. |
Keywords: | Word Discovery; Computational Language Documentation; Neural Machine Translation; Attention models |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 25 Nov 2019 11:51 |
Last Modified: | 25 Nov 2019 11:51 |
Status: | Published |
Publisher: | IEEE |
Refereed: | Yes |
Identification Number: | 10.1109/ASRU.2017.8268972 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:153555 |