Flynn, R. and Ragni, A. orcid.org/0000-0003-0634-4456 (2023) Leveraging cross-utterance context for ASR decoding. In: Proceedings of Interspeech 2023. INTERSPEECH 2023, 20-24 Aug 2024, Dublin, Ireland. ISCA - International Speech Communication Association , pp. 1359-1363.
Abstract
While external language models (LMs) are often incorporated into the decoding stage of automated speech recognition systems, these models usually operate with limited context. Cross utterance information has been shown to be beneficial during second pass re-scoring, however this limits the hypothesis space based on the local information available to the first pass LM. In this work, we investigate the incorporation of long-context transformer LMs for cross-utterance decoding of acoustic models via beam search, and compare against results from n-best rescoring. Results demonstrate that beam search allows for an improved use of cross-utterance context. When evaluating on the long-format dataset AMI, results show a 0.7% and 0.3% absolute reduction on dev and test sets compared to the single-utterance setting, with improvements when including up to 500 tokens of prior context. Evaluations are also provided for Tedlium-1 with less significant improvements of around 0.1% absolute.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of Interspeech 2023 is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | speech recognition; language modelling; crossutterance; beam-search; rescoring |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number META PLATFORM INC UNSPECIFIED |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 05 Jun 2024 11:16 |
Last Modified: | 05 Jun 2024 13:05 |
Status: | Published |
Publisher: | ISCA - International Speech Communication Association |
Refereed: | Yes |
Identification Number: | 10.21437/interspeech.2023-1941 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:213149 |
Download
Filename: INTERSPEECH_2023_Paper_Kit__camera_ready.pdf
Licence: CC-BY 4.0