Close, G., Ravenscroft, W., Hain, T. et al. (1 more author) (2024) Multi-CMGAN+/+: leveraging multi-objective speech quality metric prediction for speech enhancement. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024), 14-19 Apr 2024, Seoul, Korea. Institute of Electrical and Electronics Engineers (IEEE) , pp. 351-355. ISBN 979-8-3503-4486-8
Abstract
Neural network based approaches to speech enhancement have shown to be particularly powerful, being able to leverage a data-driven approach to result in a significant performance gain versus other approaches. Such approaches are reliant on artificially created labelled training data such that the neural model can be trained using intrusive loss functions which compare the output of the model with clean reference speech. Performance of such systems when enhancing real-world audio often suffers relative to their performance on simulated test data. In this work, a non-intrusive multi-metric prediction approach is introduced, wherein a model trained on artificial labelled data using inference of an adversarially trained metric prediction neural network. The proposed approach shows improved performance versus state-of-the-art systems on the recent CHiME-7 challenge unsupervised domain adaptation speech enhancement (UDASE) task evaluation sets.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 The Author(s). Except as otherwise noted, this author-accepted version of a conference paper published in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
Keywords: | speech enhancement; model generalisation; generative adversarial networks; conformer; metric prediction |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number LIVEPERSON, INC. UNSPECIFIED |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Mar 2024 15:08 |
Last Modified: | 28 Mar 2024 11:30 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.1109/ICASSP48485.2024.10448343 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:210277 |
Download
Filename: _George__ICASSP_MultiMetricGAN____no_real_data_final.pdf
Licence: CC-BY 4.0