Deena, S. orcid.org/0000-0001-5417-0556, Hasan, M., Doulaty, M. et al. (2 more authors) (2019) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment. IEEE/ACM Transactions on Audio, Speech and Language Processing, 27 (3). pp. 572-582. ISSN 2329-9290
Abstract
Recurrent neural network language models (RNNLMs) generally outperform n-gram language models when used in automatic speech recognition. Adapting RNNLMs to new domains is an open problem and current approaches can be categorised as either feature-based and model-based. In feature-based adaptation, the input to the RNNLM is augmented with auxiliary features whilst model-based adaptation includes model fine-tuning and the introduction of adaptation layer(s) in the network. In this paper, the properties of both types of adaptation are investigated on multi-genre broadcast speech recognition. Existing techniques for both types of adaptation are reviewed and the proposed techniques for model-based adaptation, namely the linear hidden network (LHN) adaptation layer and the K-component adaptive RNNLM, are investigated. Moreover, new features derived from the acoustic domain are investigated for RNNLM adaptation. The contributions of this paper include two hybrid adaptation techniques: the fine-tuning of feature-based RNNLMs and a feature-based adaptation layer. Moreover, the semi-supervised adaptation of RNNLMs using genre information is also proposed. The ASR systems were trained using 700h of multi-genre broadcast speech. The gains obtained when using the RNNLM adaptation techniques proposed in this work are consistent when using RNNLMs trained on an in-domain set of 10M words and on a combination of in-domain and out-of-domain sets of 660M words, with approx. 10% perplexity and 2% relative word error rate improvements on a 28.3h. test set. The best RNNLM adaptation techniques for ASR are also evaluated on a lightly supervised alignment of subtitles task for the same data, where the use of RNNLM adaptation leads to an absolute increase in the F-measure of 0.5%.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | Speech recognition; RNNLM; language model adaptation; multi-domain ASR |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number ENGINEERING AND PHYSICAL SCIENCE RESEARCH COUNCIL (EPSRC) UNSPECIFIED |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 04 Jan 2019 14:26 |
Last Modified: | 04 Jan 2019 14:30 |
Published Version: | https://doi.org/10.1109/TASLP.2018.2888814 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers |
Refereed: | Yes |
Identification Number: | 10.1109/TASLP.2018.2888814 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:140330 |