Using Speech Foundational Models in Loss Functions for Hearing Aid Speech Enhancement

There is a more recent version of this eprint available. Click here to view it.

This is a preprint and may not have undergone formal peer review

Abstract

Machine learning techniques are an active area of research for speech enhancement for hearing aids, with one particular focus on improving the intelligibility of a noisy speech signal. Recent work has shown that feature encodings from self-supervised speech representation models can effectively capture speech intelligibility. In this work, it is shown that the distance between self-supervised speech representations of clean and noisy speech correlates more strongly with human intelligibility ratings than other signal-based metrics. Experiments show that training a speech enhancement model using this distance as part of a loss function improves the performance over using an SNR-based loss function, demonstrated by an increase in HASPI, STOI, PESQ and SI-SNR scores. This method takes inference of a high parameter count model only at training time, meaning the speech enhancement model can remain smaller, as is required for hearing aids.

Metadata

Item Type:	Preprint
Authors/Creators:	Sutherland, R. Close, G. Hain, T. https://orcid.org/0000-0003-0939-3464 Goetze, S. https://orcid.org/0000-0003-1044-7343 Barker, J. https://orcid.org/0000-0002-1684-5660
Copyright, Publisher and Additional Information:	© 2024 The Author(s). This preprint is made available under a Creative Commons Attribution 4.0 International License. (https://creativecommons.org/licenses/by/4.0/)
Keywords:	Biomedical and Clinical Sciences; Allied Health and Rehabilitation Science; Clinical Sciences; Health Sciences; Information and Computing Sciences; 4602 Artificial Intelligence; 4603 Computer Vision and Multimedia Computation; Neurosciences; Prevention; Clinical Research; Rehabilitation; Bioengineering; Machine Learning and Artificial Intelligence; Assistive Technology; Ear
Dates:	Submitted: 18 July 2024
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Date Deposited:	17 Oct 2025 12:15
Last Modified:	17 Oct 2025 12:15
Status:	Submitted
Identification Number:	10.48550/arxiv.2407.13333
Related URLs:	arXiv URL
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:233134

Available Versions of this Item

Using Speech Foundational Models in Loss Functions for Hearing Aid Speech Enhancement. (deposited 17 Oct 2025 12:15) [Currently Displayed]
- Using speech foundational models in loss functions for hearing aid speech enhancement. (deposited 06 Aug 2025 13:47)

Download

Preprint

Filename: 2407.13333v1 (1).pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)