Remes, U., López, A.R., Juvela, L. et al. (4 more authors) (2015) Comparing human and automatic speech recognition in a perceptual restoration experiment. Computer Speech and Language, 35. 14 - 31. ISSN 0885-2308
Abstract
Speech that has been distorted by introducing spectral or temporal gaps is still perceived as continuous and complete by human listeners, so long as the gaps are filled with additive noise of sufficient intensity. When such perceptual restoration occurs, the speech is also more intelligible compared to the case in which noise has not been added in the gaps. This observation has motivated so-called 'missing data' systems for automatic speech recognition (ASR), but there have been few attempts to determine whether such systems are a good model of perceptual restoration in human listeners. Accordingly, the current paper evaluates missing data ASR in a perceptual restoration task. We evaluated two systems that use a new approach to bounded marginalisation in the cepstral domain, and a bounded conditional mean imputation method. Both methods model available speech information as a clean-speech posterior distribution that is subsequently passed to an ASR system. The proposed missing data ASR systems were evaluated using distorted speech, in which spectro-temporal gaps were optionally filled with additive noise. Speech recognition performance of the proposed systems was compared against a baseline ASR system, and with human speech recognition performance on the same task. We conclude that missing data methods improve speech recognition performance in a manner that is consistent with perceptual restoration in human listeners.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2015 Elsevier. This is an author produced version of a paper subsequently published in Computer Speech and Language. Uploaded in accordance with the publisher's self-archiving policy. Article available under the terms of the CC-BY-NC-ND licence (https://creativecommons.org/licenses/by-nc-nd/4.0/) |
Keywords: | Automatic speech recognition; Missing data; Observation uncertainties; Perceptual restoration; Uncertainty propagation |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 29 Oct 2015 14:52 |
Last Modified: | 25 Jul 2017 14:44 |
Published Version: | https://doi.org/10.1016/j.csl.2015.06.005 |
Status: | Published |
Publisher: | Elsevier |
Refereed: | Yes |
Identification Number: | 10.1016/j.csl.2015.06.005 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:90470 |