Park, C. orcid.org/0000-0001-6671-1671, Kang, H. and Hain, T. orcid.org/0000-0003-0939-3464 (2024) Character error rate estimation for automatic speech recognition of short utterances. In: Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO). 2024 32nd European Signal Processing Conference (EUSIPCO), 26-30 Aug 2024, Lyon, France. Institute of Electrical and Electronics Engineers (IEEE) , pp. 131-135. ISBN 9798331519773
Abstract
The quality of an automatic speech recognition (ASR) system’s output can be measured by comparing it with a gold standard reference. Evaluating an error rate (ER) is costly and therefore not always possible. Instead, one can aim to provide estimates for quality, without explicit reference. Prior work has concentrated on confidence scoring or word error rate (WER) estimation. The latter is typically model based, and it was found that the performance of a WER estimation model degrades when it is trained on short utterances. To address this issue this work presents an ER estimation model using character error rate (CER), called Fe-CER. The ER estimation model for ASR system’s output employs character-level tokenisation for higher resolution on relatively short utterances. Fe-CER is compared with other ER estimation models using phonemes, byte-pair encoding tokens as well as words. The performance of the models is measured using normalised root mean square error (nRMSE), which takes into consideration the different distributions of target ERs. Fe-CER trained on Chime5 is shown to outperform the baseline model using word error rate in nRMSE and PCC by 6.00% and 8.79% relative, respectively.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | Automatic speech recognition; Error rate estimation; Word error rate; Character error rate; Tokenisation |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 05 Jun 2025 12:22 |
Last Modified: | 05 Jun 2025 15:32 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.23919/eusipco63174.2024.10715433 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:227353 |
Download
Filename: Chanho___EUSIPCO_2024_v1_4__post_camera_ready_.pdf
Licence: CC-BY 4.0