Close, G., Hain, T. orcid.org/0000-0003-0939-3464 and Goetze, S. orcid.org/0000-0003-1044-7343 (2024) Hallucination in perceptual metric-driven speech enhancement networks. In: Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO). 2024 32nd European Signal Processing Conference (EUSIPCO), 26-30 Aug 2024, Lyon, France. Institute of Electrical and Electronics Engineers (IEEE) , pp. 21-25. ISBN: 9798331519773 ISSN: 2219-5491 EISSN: 2076-1465
Abstract
Within the area of speech enhancement, there is an ongoing interest in the creation of neural systems which explicitly aim to improve the perceptual quality of the processed audio. In concert with this is the topic of non-intrusive (i.e. without clean reference) speech quality prediction, for which neural networks are trained to predict human-assigned quality labels directly from distorted audio. When combined, these areas allow for the creation of powerful new speech enhancement systems which can leverage large real-world datasets of distorted audio, by taking inference of a pre-trained speech quality predictor as the sole loss function of the speech enhancement system. This paper aims to identify a potential pitfall with this approach, namely hallucinations which are introduced by the enhancement system ‘tricking’ the speech quality predictor.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | Speech enhancement; non-intrusive speech qual-ity prediction; generative models for signal enhancement |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 07 Aug 2025 13:22 |
Last Modified: | 07 Aug 2025 13:22 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.23919/eusipco63174.2024.10714927 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230147 |