Loweimi, E., Barker, J. orcid.org/0000-0002-1684-5660 and Hain, T. orcid.org/0000-0003-0939-3464 (2017) Statistical Normalisation of Phase-based Feature Representation For Robust Speech Recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE International Conference on Acoustics Speech and Signal Processing, 05/03/2017 - 09/03/2017, New Orleans. Institute of Electrical and Electronics Engineers ISBN 978-1-5090-4117-6
Abstract
In earlier work we have proposed a source-filter decomposition of speech through phase-based processing. The decomposition leads to novel speech features that are extracted from the filter component of the phase spectrum. This paper analyses this spectrum and the proposed representation by evaluating statistical properties at various points along the parametrisation pipeline. We show that speech phase spectrum has a bell-shaped distribution which is in contrast to the uniform assumption that is usually made. It is demonstrated that the uniform density (which implies that the corresponding sequence is least-informative) is an artefact of the phase wrapping and not an original characteristic of this spectrum. In addition, we extend the idea of statistical normalisation usually applied for the magnitudebased features into the phase domain. Based on the statistical structure of the phase-based features, which is shown to be super-gaussian in the clean condition, three normalisation schemes, namely, Gaussianisation, Laplacianisation and table-based histogram equalisation have been applied for improving the robustness. Speech recognition experiments using Aurora-2 show that applying an optimal normalisation scheme at the right stage of the feature extraction process can produce average relative WER reductions of up to 18.6% across the 0-20 dB SNR conditions.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © Institute of Electrical and Electronics Engineers, 2017. This is an author produced version of a paper subsequently published in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. Uploaded in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 08 Mar 2017 11:08 |
Last Modified: | 14 Aug 2017 10:03 |
Published Version: | https://doi.org/10.1109/ICASSP.2017.7953170 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers |
Refereed: | Yes |
Identification Number: | 10.1109/ICASSP.2017.7953170 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:112751 |