Investigation of unsupervised adaptation of DNN acoustic models with filter bank input

Yoshioka, T., Ragni, A. orcid.org/0000-0003-0634-4456 and Gales, M.J.F. (2014) Investigation of unsupervised adaptation of DNN acoustic models with filter bank input. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-09 May 2014, Florence, Italy. IEEE , pp. 6344-6348. ISBN 9781479928934

Abstract

Adaptation to speaker variations is an essential component of speech recognition systems. One common approach to adapting deep neural network (DNN) acoustic models is to perform global constrained maximum likelihood linear regression (CMLLR) at some point of the systems. Using CMLLR (or more generally, generative approaches) is advantageous especially in unsupervised adaptation scenarios with high baseline error rates. On the other hand, as the DNNs are less sensitive to the increase in the input dimensionality than GMMs, it is becoming more popular to use rich speech representations, such as log mel-filter bank channel outputs, instead of conventional low-dimensional feature vectors, such as MFCCs and PLP coefficients. This work discusses and compares three different configurations of DNN acoustic models that allow CMLLR-based speaker adaptive training (SAT) to be performed in systems with filter bank inputs. Results of unsupervised adaptation experiments conducted on three different data sets are presented, demonstrating that, by choosing an appropriate configuration, SAT with CMLLR can improve the performance of a well-trained filter bank-based speaker independent DNN system by 10.6% relative in a challenging task with a baseline error rate above 40%. It is also shown that the filter bank features are advantageous than the conventional features even when they are used with SAT models. Some other insights are also presented, including the effects of block diagonal transforms and system combination.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Yoshioka, T. Ragni, A. https://orcid.org/0000-0003-0634-4456 Gales, M.J.F.
Copyright, Publisher and Additional Information:	© 2019 IEEE.
Keywords:	Deep neural network; acoustic model adaptation; hybrid; tandem; stacked hybrid
Dates:	Published (online): 14 July 2014 Published: 14 July 2014
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	13 Nov 2019 10:29
Last Modified:	13 Nov 2019 10:29
Status:	Published
Publisher:	IEEE
Refereed:	Yes
Identification Number:	10.1109/icassp.2014.6854825
Related URLs:	Author
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:152843

CORE (COnnecting REpositories)

Investigation of unsupervised adaptation of DNN acoustic models with filter bank input

Abstract

Metadata

Download not available

Export

Statistics