Yue, Z., Xiong, F., Christensen, H. et al. (1 more author) (2020) Exploring appropriate acoustic and language modelling choices for continuous dysarthric speech recognition. In: Proceedings of the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020). ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, 04-08 May 2020, Barcelona, Spain. IEEE , pp. 6094-6098. ISBN 9781509066322
Abstract
There has been much recent interest in building continuous speech recognition systems for people with severe speech impairments, e.g., dysarthria. However, the datasets that are commonly used are typically designed for tasks other than ASR development, or they contain only isolated words. As such, they contain much overlap in the prompts read by the speakers. Previous ASR evaluations have often neglected this, using language models (LMs) trained on non-disjoint training and test data, potentially producing unrealistically optimistic results. In this paper, we investigate the impact of LM design using the widely used TORGO database. We combine state-of-the-art acoustic models with LMs trained with data originating from LibriSpeech. Using LMs with varying vocabulary size, we examine the trade-off between the out-of-vocabulary rate and recognition confusions for speakers with varying degrees of dysarthria. It is found that the optimal LM complexity is highly speaker dependent, highlighting the need to design speaker-dependent LMs alongside speaker-dependent acoustic models when considering atypical speech.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | Continuous dysarthric speech recognition; language modelling; out-of-domain data |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Feb 2020 14:33 |
Last Modified: | 14 May 2021 00:38 |
Status: | Published |
Publisher: | IEEE |
Refereed: | Yes |
Identification Number: | 10.1109/ICASSP40776.2020.9054343 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:156701 |