The effect of spoken language on speech enhancement using self-supervised speech representation loss functions

Close, G. orcid.org/0000-0002-9478-5421, Hain, T. orcid.org/0000-0003-0939-3464 and Goetze, S. orcid.org/0000-0003-1044-7343 (2023) The effect of spoken language on speech enhancement using self-supervised speech representation loss functions. In: Proceedings of 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 22-25 Oct 2023, New Paltz, NY, USA. Institute of Electrical and Electronics Engineers (IEEE) ISBN 9798350323733

Abstract

Recent work in the field of speech enhancement (SE) has involved the use of self-supervised speech representations (SSSRs) as feature transformations in loss functions. However, in prior work, very little attention has been paid to the relationship between the language of the audio used to train the self-supervised representation and that used to train the SE system. Enhancement models trained using a loss function which incorporates a self-supervised representation that shares exactly the language of the noisy data used to train the SE system show better performance than those which do not match exactly. This may lead to enhancement systems which are language specific and as such do not generalise well to unseen languages, unlike models trained using traditional spectrogram or time domain loss functions. In this work, SE models are trained and tested on a number of different languages, with self-supervised representations which themselves are trained using different language combinations and with differing network structures as loss function representations. These models are then tested across unseen languages and their performances are analysed. It is found that the training language of the self-supervised representation appears to have a minor effect on enhancement performance, the amount of training data of a particular language, however, greatly affects performance.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Close, G. https://orcid.org/0000-0002-9478-5421 Hain, T. https://orcid.org/0000-0003-0939-3464 Goetze, S. https://orcid.org/0000-0003-1044-7343
Copyright, Publisher and Additional Information:	© 2025 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in Proceedings of 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	Speech enhancement; self-supervised speech representations; language; domain adaption; neural networks
Dates:	Published (online): 15 September 2023 Published: 15 September 2023
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Funding Information:	Funder Grant number Engineering and Physical Sciences Research Council EP/S023062/1
Depositing User:	Symplectic Sheffield
Date Deposited:	18 Jul 2025 13:22
Last Modified:	19 Jul 2025 23:41
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Identification Number:	10.1109/waspaa58266.2023.10248166
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:229419

CORE (COnnecting REpositories)

The effect of spoken language on speech enhancement using self-supervised speech representation loss functions

Abstract

Metadata

Download

Accepted Version

Export

Statistics