Character error rate estimation for automatic speech recognition of short utterances

Park, C. orcid.org/0000-0001-6671-1671, Kang, H. and Hain, T. orcid.org/0000-0003-0939-3464 (2024) Character error rate estimation for automatic speech recognition of short utterances. In: Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO). 2024 32nd European Signal Processing Conference (EUSIPCO), 26-30 Aug 2024, Lyon, France. Institute of Electrical and Electronics Engineers (IEEE) , pp. 131-135. ISBN 9798331519773

Abstract

The quality of an automatic speech recognition (ASR) system’s output can be measured by comparing it with a gold standard reference. Evaluating an error rate (ER) is costly and therefore not always possible. Instead, one can aim to provide estimates for quality, without explicit reference. Prior work has concentrated on confidence scoring or word error rate (WER) estimation. The latter is typically model based, and it was found that the performance of a WER estimation model degrades when it is trained on short utterances. To address this issue this work presents an ER estimation model using character error rate (CER), called Fe-CER. The ER estimation model for ASR system’s output employs character-level tokenisation for higher resolution on relatively short utterances. Fe-CER is compared with other ER estimation models using phonemes, byte-pair encoding tokens as well as words. The performance of the models is measured using normalised root mean square error (nRMSE), which takes into consideration the different distributions of target ERs. Fe-CER trained on Chime5 is shown to outperform the baseline model using word error rate in nRMSE and PCC by 6.00% and 8.79% relative, respectively.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Park, C. https://orcid.org/0000-0001-6671-1671 Kang, H. Hain, T. https://orcid.org/0000-0003-0939-3464
Copyright, Publisher and Additional Information:	© 2024 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	Automatic speech recognition; Error rate estimation; Word error rate; Character error rate; Tokenisation
Dates:	Published (online): 23 October 2024 Published: 23 October 2024
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	05 Jun 2025 12:22
Last Modified:	05 Jun 2025 15:32
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Identification Number:	10.23919/eusipco63174.2024.10715433
Related URLs:	Author Conference
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:227353

CORE (COnnecting REpositories)

Character error rate estimation for automatic speech recognition of short utterances

Abstract

Metadata

Download

Accepted Version

Export

Statistics