Raw source and filter modelling for dysarthric speech recognition

Yue, Z., Loweimi, E. and Cvetkovic, Z. (2022) Raw source and filter modelling for dysarthric speech recognition. In: Proceedings of ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 23-27 May 2022, Singapore, Singapore. IEEE , pp. 7377-7381.

Abstract

Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between the typical and dysarthric speech complicates transfer learning. In this paper, we build acoustic models using the raw magnitude spectra of the source and filter components. The proposed multi-stream model consists of convolutional and recurrent layers. It allows for fusing the vocal tract and excitation components at different levels of abstraction and after per-stream pre-processing. We show that such a multi-stream processing leverages these two information streams and helps s model towards normalising the speaker attributes and speaking style. This potentially leads to better handling of the dysarthric speech with a large inter-speaker and intra-speaker variability. We compare the proposed system with various features, study the training dynamics, explore usefulness of the data augmentation and provide interpretation for the learned convolutional filters. On the widely used TORGO dysarthric speech corpus, the proposed approach results in up to 1.7% absolute WER reduction for dysarthric speech compared with the MFCC base-line. Our best model reaches up to 40.6% and 11.8% WER for dysarthric and typical speech, respectively.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Yue, Z. Loweimi, E. Cvetkovic, Z.
Copyright, Publisher and Additional Information:	© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy.
Keywords:	Dysarthric speech recognition; source-filter separation and fusion; multi-stream acoustic modelling
Dates:	Published (online): 27 April 2022 Published: 27 April 2022
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Funding Information:	Funder Grant number EUROPEAN COMMISSION - HORIZON 2020 766287 - TAPAS
Depositing User:	Symplectic Sheffield
Date Deposited:	27 Oct 2022 12:02
Last Modified:	27 Apr 2023 00:13
Status:	Published
Publisher:	IEEE
Refereed:	Yes
Identification Number:	10.1109/icassp43922.2022.9746553
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:192462

CORE (COnnecting REpositories)

Raw source and filter modelling for dysarthric speech recognition

Abstract

Metadata

Download

Accepted Version

Export

Statistics