Using speech foundational models in loss functions for hearing aid speech enhancement

This is the latest version of this eprint.

Sutherland, R., Close, G., Hain, T. orcid.org/0000-0003-0939-3464 et al. (2 more authors) (2024) Using speech foundational models in loss functions for hearing aid speech enhancement. In: Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO). 2024 32nd European Signal Processing Conference (EUSIPCO), 26-30 Aug 2024, Lyon, France. . Institute of Electrical and Electronics Engineers (IEEE), pp. 421-425. ISBN: 9798-331519773. ISSN: 2219-5491. EISSN: 2076-1465.

Abstract

Machine learning techniques are an active area of research for speech enhancement for hearing aids, with one particular focus on improving the intelligibility of a noisy speech signal. Recent work has shown that feature encodings from self-supervised speech representation models can effectively capture speech intelligibility. In this work, it is shown that the distance between self-supervised speech representations of clean and noisy speech correlates more strongly with human intelligibility ratings than other signal-based metrics. Experiments show that training a speech enhancement model using this distance as part of a loss function improves the performance over using an SNR-based loss function, demonstrated by an increase in HASPI, STOI, PESQ and SI-SNR scores. This method takes inference of a high parameter count model only at training time, meaning the speech enhancement model can remain smaller, as is required for hearing aids.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Sutherland, R. Close, G. Hain, T. https://orcid.org/0000-0003-0939-3464 Goetze, S. https://orcid.org/0000-0003-1044-7343 Barker, J. https://orcid.org/0000-0002-1684-5660
Copyright, Publisher and Additional Information:	© 2024 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	self-supervised speech representations; speech enhancement; loss functions
Dates:	Published (online): 23 October 2024 Published: 23 October 2024
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Date Deposited:	06 Aug 2025 13:47
Last Modified:	17 Oct 2025 12:17
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Identification Number:	10.23919/eusipco63174.2024.10714933
Related URLs:	Author arXiv URL Conference
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:230087

Available Versions of this Item

Using Speech Foundational Models in Loss Functions for Hearing Aid Speech Enhancement. (deposited 17 Oct 2025 12:15)
- Using speech foundational models in loss functions for hearing aid speech enhancement. (deposited 06 Aug 2025 13:47) [Currently Displayed]

Download

Accepted Version

Filename: 2407.13333v1.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)