Xiong, F., Barker, J., Yue, Z. et al. (1 more author) (2020) Source domain data selection for improved transfer learning targeting dysarthric speech recognition. In: Proceedings of the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020). ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, 04-08 May 2020, Barcelona, Spain. IEEE , pp. 7424-7428. ISBN 9781509066322
Abstract
This paper presents an improved transfer learning framework applied to robust personalised speech recognition models for speakers with dysarthria. As the baseline of transfer learning, a state-of-theart CNN-TDNN-F ASR acoustic model trained solely on source domain data is adapted onto the target domain via neural network weight adaptation with the limited available data from target dysarthric speakers. Results show that linear weights in neural layers play the most important role for an improved modelling of dysarthric speech evaluated using UASpeech corpus, achieving averaged 11.6% and 7.6% relative recognition improvement in comparison to the conventional speaker-dependent training and data combination, respectively. To further improve the transferability towards target domain, we propose an utterance-based data selection of the source domain data based on the entropy of posterior probability, which is analysed to statistically obey a Gaussian distribution. Compared to a speaker-based data selection via dysarthria similarity measure, this allows for a more accurate selection of the potentially beneficial source domain data for either increasing the target domain training pool or constructing an intermediate domain for incremental transfer learning, resulting in a further absolute recognition performance improvement of nearly 2% added to transfer learning baseline for speakers with moderate to severe dysarthria.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | Transfer learning; data selection; entropy; posterior probability; dysarthric speech recognition |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Feb 2020 14:40 |
Last Modified: | 14 May 2021 00:38 |
Status: | Published |
Publisher: | IEEE |
Refereed: | Yes |
Identification Number: | 10.1109/ICASSP40776.2020.9054694 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:156702 |