Ravenscroft, W. orcid.org/0000-0002-0780-3303, Close, G., Goetze, S. et al. (4 more authors) (2024) Transcription-free fine-tuning of speech separation models for noisy and reverberant multi-speaker automatic speech recognition. In: Proceedings of Interspeech 2024. Interspeech 2024, 01-05 Sep 2024, Kos Island, Greece. International Speech Communication Association (ISCA) , pp. 4998-5002.
Abstract
One solution to automatic speech recognition (ASR) of overlapping speakers is to separate speech and then perform ASR on the separated signals. Commonly, the separator produces artefacts which often degrade ASR performance. Addressing this issue typically requires reference transcriptions to jointly train the separation and ASR networks. This is often not viable for training on real-world in-domain audio where reference transcript information is not always available. This paper proposes a transcription-free method for joint training using only audio signals. The proposed method uses embedding differences of pre-trained ASR encoders as a loss with a proposed modification to permutation invariant training (PIT) called guided PIT (GPIT). The method achieves a 6.4% improvement in word error rate (WER) measures over a signal-level loss and also shows enhancement improvements in perceptual measures such as short-time objective intelligibility (STOI).
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 ISCA. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | speech recognition; speech separation; multispeaker; adaptation; fine-tuning |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council 2268977 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 Jun 2024 08:57 |
Last Modified: | 02 Sep 2024 13:19 |
Status: | Published |
Publisher: | International Speech Communication Association (ISCA) |
Refereed: | Yes |
Identification Number: | 10.21437/Interspeech.2024-1264 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:213506 |