Comparing human and automatic speech recognition in a perceptual restoration experiment

Abstract

Speech that has been distorted by introducing spectral or temporal gaps is still perceived as continuous and complete by human listeners, so long as the gaps are filled with additive noise of sufficient intensity. When such perceptual restoration occurs, the speech is also more intelligible compared to the case in which noise has not been added in the gaps. This observation has motivated so-called 'missing data' systems for automatic speech recognition (ASR), but there have been few attempts to determine whether such systems are a good model of perceptual restoration in human listeners. Accordingly, the current paper evaluates missing data ASR in a perceptual restoration task. We evaluated two systems that use a new approach to bounded marginalisation in the cepstral domain, and a bounded conditional mean imputation method. Both methods model available speech information as a clean-speech posterior distribution that is subsequently passed to an ASR system. The proposed missing data ASR systems were evaluated using distorted speech, in which spectro-temporal gaps were optionally filled with additive noise. Speech recognition performance of the proposed systems was compared against a baseline ASR system, and with human speech recognition performance on the same task. We conclude that missing data methods improve speech recognition performance in a manner that is consistent with perceptual restoration in human listeners.

Metadata

Item Type:	Article
Authors/Creators:	Remes, U. López, A.R. Juvela, L. Palomäki, K. Brown, G.J. Alku, P. Kurimo, M.
Copyright, Publisher and Additional Information:	© 2015 Elsevier. This is an author produced version of a paper subsequently published in Computer Speech and Language. Uploaded in accordance with the publisher's self-archiving policy. Article available under the terms of the CC-BY-NC-ND licence (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Keywords:	Automatic speech recognition; Missing data; Observation uncertainties; Perceptual restoration; Uncertainty propagation
Dates:	Accepted: 16 June 2015 Published: 24 June 2015
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	29 Oct 2015 14:52
Last Modified:	25 Jul 2017 14:44
Published Version:	https://doi.org/10.1016/j.csl.2015.06.005
Status:	Published
Publisher:	Elsevier
Refereed:	Yes
Identification Number:	10.1016/j.csl.2015.06.005
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:90470

CORE (COnnecting REpositories)

Comparing human and automatic speech recognition in a perceptual restoration experiment

Abstract

Metadata

Download

Accepted Version

Export

Statistics