Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Abstract

An effective way to increase noise robustness in automatic speech recognition (ASR) systems is feature enhancement based on an analytical distortion model that describes the effects of noise on the speech features. One of such distortion models that has been reported to achieve a good trade-off between accuracy and simplicity is the masking model. Under this model, speech distortion caused by environmental noise is seen as a spectral mask and, as a result, noisy speech features can be either reliable (speech is not masked by noise) or unreliable (speech is masked). In this paper, we present a detailed overview of this model and its applications to noise robust ASR. Firstly, using the masking model, we derive a spectral reconstruction technique aimed at enhancing the noisy speech features. Two problems must be solved in order to perform spectral reconstruction using the masking model: (1) mask estimation, i.e. determining the reliability of the noisy features, and (2) feature imputation, i.e. estimating speech for the unreliable features. Unlike missing data imputation techniques where the two problems are considered as independent, our technique jointly addresses them by exploiting a priori knowledge of the speech and noise sources in the form of a statistical model. Secondly, we propose an algorithm for estimating the noise model required by the feature enhancement technique. The proposed algorithm fits a Gaussian mixture model to the noise by iteratively maximising the likelihood of the noisy speech signal so that noise can be estimated even during speech-dominating frames. A comprehensive set of experiments carried out on the Aurora-2 and Aurora-4 databases shows that the proposed method achieves significant improvements over the baseline system and other similar missing data imputation techniques.

Metadata

Item Type:	Article
Authors/Creators:	Gonzalez, J.A. Gómez, A.M. Peinado, A.M. Ma, N. https://orcid.org/0000-0002-4112-3109 Barker, J. https://orcid.org/0000-0002-1684-5660
Copyright, Publisher and Additional Information:	© 2017 Year Springer Science+Business Media New York. This is an author produced version of a paper subsequently published in Circuits, Systems, and Signal Processing. Uploaded in accordance with the publisher's self-archiving policy.
Keywords:	Speech recognition; Noise robustness; Feature compensation; Noise model estimation; Missing data imputation
Dates:	Accepted: 20 December 2016 Published (online): 6 January 2017 Published: 6 January 2017
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	10 Feb 2017 11:46
Last Modified:	06 Jul 2023 10:41
Published Version:	https://doi.org/10.1007/s00034-016-0480-7
Status:	Published
Publisher:	Springer Verlag (Germany)
Refereed:	Yes
Identification Number:	10.1007/s00034-016-0480-7
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:112035

CORE (COnnecting REpositories)

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Abstract

Metadata

Download

Accepted Version

Export

Statistics