Saz, O. and Hain, T. orcid.org/0000-0003-0939-3464 (2013) Asynchronous factorisation of speaker and background with feature transforms in speech recognition. In: INTERSPEECH-2013. INTERSPEECH 2013 - 14th Annual Conference of the International Speech Communication Association, 25-29 Aug 2013, Lyon, France. ISCA , pp. 1238-1242.
Abstract
This paper presents a novel approach to separate the effects of speaker and background conditions by application of featuretransform based adaptation for Automatic Speech Recognition (ASR). So far factorisation has been shown to yield improvements in the case of utterance-synchronous environments. In this paper we show successful separation of conditions asynchronous with speech, such as background music. Our work takes account of the asynchronous nature of the background, by estimation of condition-specific Constrained Maximum Likelihood Linear Regression (CMLLR) transforms. In addition, speaker adaptation is performed, allowing to factorise speaker and background effects. Equally, background transforms are used asynchronously in the decoding process, using a modified Hidden Markov Model (HMM) topology which applies the optimal transform for each frame. Experimental results are presented on the WSJCAM0 corpus of British English speech, modified to contain controlled sections of background music. This addition of music degrades the baseline Word Error Rate (WER) from 10.1% to 26.4%. While synchronous factorisation with CMLLR transforms provides 28% relative improvement in WER over the baseline, our asynchronous approach increases this reduction to 33%.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © ISCA 2013. ISCA grants each author permission to use the article in that author's dissertation or in institutional repositories (paper and/or electronic versions), provided that the article is correctly referenced (including page numbers and/or paper number). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 Aug 2016 14:02 |
Last Modified: | 28 Mar 2018 20:25 |
Published Version: | http://www.isca-speech.org/archive/interspeech_201... |
Status: | Published |
Publisher: | ISCA |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:101804 |