Asynchronous factorisation of speaker and background with feature transforms in speech recognition

Abstract

This paper presents a novel approach to separate the effects of speaker and background conditions by application of featuretransform based adaptation for Automatic Speech Recognition (ASR). So far factorisation has been shown to yield improvements in the case of utterance-synchronous environments. In this paper we show successful separation of conditions asynchronous with speech, such as background music. Our work takes account of the asynchronous nature of the background, by estimation of condition-specific Constrained Maximum Likelihood Linear Regression (CMLLR) transforms. In addition, speaker adaptation is performed, allowing to factorise speaker and background effects. Equally, background transforms are used asynchronously in the decoding process, using a modified Hidden Markov Model (HMM) topology which applies the optimal transform for each frame. Experimental results are presented on the WSJCAM0 corpus of British English speech, modified to contain controlled sections of background music. This addition of music degrades the baseline Word Error Rate (WER) from 10.1% to 26.4%. While synchronous factorisation with CMLLR transforms provides 28% relative improvement in WER over the baseline, our asynchronous approach increases this reduction to 33%.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Saz, O. Hain, T. https://orcid.org/0000-0003-0939-3464
Copyright, Publisher and Additional Information:	© ISCA 2013. ISCA grants each author permission to use the article in that author's dissertation or in institutional repositories (paper and/or electronic versions), provided that the article is correctly referenced (including page numbers and/or paper number).
Dates:	Published: 25 August 2013
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	18 Aug 2016 14:02
Last Modified:	28 Mar 2018 20:25
Published Version:	http://www.isca-speech.org/archive/interspeech_201...
Status:	Published
Publisher:	ISCA
Refereed:	Yes
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:101804

CORE (COnnecting REpositories)

Asynchronous factorisation of speaker and background with feature transforms in speech recognition

Abstract

Metadata

Download

Accepted Version

Export

Statistics