This is the latest version of this eprint.
Close, G., Ravenscroft, W., Hain, T. orcid.org/0000-0003-0939-3464 et al. (1 more author) (2023) Perceive and predict: self-supervised speech representation based loss functions for speech enhancement. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Proceedings. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-10 Jun 2023, Rhodes Island, Greece. Institute of Electrical and Electronics Engineers (IEEE) ISBN 9781728163284
Abstract
Recent work in the domain of speech enhancement has explored the use of self-supervised speech representations to aid in the training of neural speech enhancement models. However, much of this work focuses on using the deepest or final outputs of self supervised speech representation models, rather than the earlier feature encodings. The use of self supervised representations in such a way is often not fully motivated. In this work it is shown that the distance between the feature encodings of clean and noisy speech correlate strongly with psychoacoustically motivated measures of speech quality and intelligibility, as well as with human Mean Opinion Score (MOS) ratings. Experiments using this distance as a loss function are performed and improved performance over the use of STFT spectrogram distance based loss as well as other common loss functions from speech enhancement literature is demonstrated using objective measures such as perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI).
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 The Authors. Except as otherwise noted, this author-accepted version of a paper published in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Proceedings is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | self-supervised representations; speech enhancement; loss functions; neural networks |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council EP/S023062/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 21 Jun 2023 09:24 |
Last Modified: | 04 Sep 2023 12:43 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.1109/icassp49357.2023.10095666 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:200678 |
Available Versions of this Item
-
Perceive and predict: self-supervised speech representation based loss functions for speech enhancement. (deposited 01 Aug 2023 16:39)
- Perceive and predict: self-supervised speech representation based loss functions for speech enhancement. (deposited 21 Jun 2023 09:24) [Currently Displayed]
Download
Filename: _ICASSP2023__SSSR___SpecMSE_for_Speech_Enhancement__George_ (1).pdf
Licence: CC-BY 4.0