Knill, K.M., Gales, M.J.F., Ragni, A. orcid.org/0000-0003-0634-4456 et al. (1 more author) (2014) Language independent and unsupervised acoustic models for speech recognition and keyword spotting. In: INTERSPEECH 2014 : 15th Annual Conference of the International Speech Communication Association. INTERSPEECH 2014 : 15th Annual Conference of the International Speech Communication Association, 14-18 Sep 2014, Singapore. International Speech Communication Association (ISCA)
Abstract
Developing high-performance speech processing systems for low-resource languages is very challenging. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to train a multi-language bottleneck DNN. Language dependent and/or multi-language (all training languages) Tandem acoustic models (AM) are then trained. This work considers a particular scenario where the target language is unseen in multi-language training and has limited language model training data, a limited lexicon, and acoustic training data without transcriptions. A zero acoustic resources case is first described where a multilanguage AM is directly applied, as a language independent AM (LIAM), to an unseen language. Secondly, in an unsupervised approach a LIAM is used to obtain hypotheses for the target language acoustic data transcriptions which are then used in training a language dependent AM. 3 languages from the IARPA Babel project are used for assessment: Vietnamese, Haitian Creole and Bengali. Performance of the zero acoustic resources system is found to be poor, with keyword spotting at best 60% of language dependent performance. Unsupervised language dependent training yields performance gains. For one language (Haitian Creole) the Babel target is achieved on the in-vocabulary data.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2014 International Speech Communication Association. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | speech recognition; low resource; multilingual |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 12 Nov 2019 15:44 |
Last Modified: | 12 Nov 2019 15:44 |
Published Version: | https://www.isca-speech.org/archive/interspeech_20... |
Status: | Published |
Publisher: | International Speech Communication Association (ISCA) |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:152842 |