Yue, Z., Christensen, H. and Barker, J. (2020) Autoencoder bottleneck features with multi-task optimisation for improved continuous dysarthric speech recognition. In: Proceedings of Interspeech 2020. Interspeech 2020, 25-29 Oct 2020, Shanghai, China (Online). International Speech Communication Association (ISCA) , pp. 4581-4585.
Abstract
Automatic recognition of dysarthric speech is a very challenging research problem where performances still lag far behind those achieved for typical speech. The main reason is the lack of suitable training data to accommodate for the large mismatch seen between dysarthric and typical speech. Only recently has focus moved from single-word tasks to exploring continuous speech ASR needed for dictation and most voice-enabled interfaces. This paper investigates improvements to dysarthric continuous ASR. In particular, we demonstrate the effectiveness of using unsupervised autoencoder-based bottleneck (AE-BN) feature extractor trained on out-of-domain (OOD) LibriSpeech data. We further explore multi-task optimisation techniques shown to benefit typical speech ASR. We propose a 5-fold cross-training setup on the widely used TORGO dysarthric database. A setup we believe is more suitable for this low-resource data domain. Results show that adding the proposed AE-BN features achieves an average absolute (word error rate) WER improvement of 2.63% compared to the baseline system. A further reduction of 2.33% and 0.65% absolute WER is seen when applying monophone regularisation and joint optimisation techniques, respectively. In general, the ASR system employing monophone regularisation trained on AE-BN features exhibits the best performance.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 ISCA. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | continuous dysarthric speech recognition; autoencoder bottleneck features; multi-task optimisation |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number European Commission - Horizon 2020 766287 - TAPAS |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 10 Aug 2020 12:43 |
Last Modified: | 13 Jan 2021 11:02 |
Status: | Published |
Publisher: | International Speech Communication Association (ISCA) |
Refereed: | Yes |
Identification Number: | 10.21437/Interspeech.2020-2746 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:164230 |