Liu, Y., Karanasou, P. and Hain, T. (2015) An Investigation into Speaker Informed DNN Front-end for LVCSR. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 19-24 Apr 2015, Brisbane, Australia. IEEE Conference Publications . IEEE , IEEE Xplore ISBN 978-1-4673-6997-8/15
Abstract
Deep Neural Network (DNN) has become a standard method in many ASR tasks. Recently there is considerable interest in "informed training" of DNNs, where DNN input is augmented with auxiliary codes, such as i-vectors, speaker codes, speaker separation bottleneck (SSBN) features, etc. This paper compares different speaker informed DNN training methods in LVCSR task. We discuss mathematical equivalence between speaker informed DNN training and "bias adaptation" which uses speaker dependent biases, and give detailed analysis on influential factors such as dimension, discrimination and stability of auxiliary codes. The analysis is supported by experiments on a meeting recognition task using bottleneck feature based system. Results show that i-vector based adaptation is also effective in bottleneck feature based system (not just hybrid systems). However all tested methods show poor generalisation to unseen speakers. We introduce a system based on speaker classification followed by speaker adaptation of biases, which yields equivalent performance to an i-vector based system with 10.4% relative improvement over baseline on seen speakers. The new approach can serve as a fast alternative especially for short utterances.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2015 IEEE. This is an author produced version of a paper subsequently published in IEEE Conference Proceedings. Uploaded in accordance with the publisher's self-archiving policy. Copyright 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 15 Oct 2015 15:33 |
Last Modified: | 19 Dec 2022 13:31 |
Published Version: | https://doi.org/10.1109/ICASSP.2015.7178782 |
Status: | Published |
Publisher: | IEEE |
Series Name: | IEEE Conference Publications |
Refereed: | Yes |
Identification Number: | 10.1109/ICASSP.2015.7178782 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:86695 |