Park, C. and Hain, T. orcid.org/0000-0003-0939-3464 (2025) Semi-supervised learning for automatic speech recognition with word error rate estimation and targeted domain data selection. In: Scharenborg, O., Oertel, C. and Truong, K., (eds.) Proceedings of Interspeech 2025. Interspeech 2025, 17-21 Aug 2025, Rotterdam, The Netherlands. International Speech Communication Association (ISCA) , pp. 3663-3667. ISSN: 2958-1796 EISSN: 2958-1796
Abstract
There is a growing demand for leveraging untranscribed multi-domain data in semi-supervised learning (SSL) for automatic speech recognition (ASR) to broaden its applications. However, domain mismatch between source and target data can limit SSL’s performance gains, even when transcript accuracy for training is high. While word error rate (WER) estimation (WE) methods for automatic transcription have advanced, they remain insufficient for handling multi-domain data. This paper proposes a novel data selection method for SSL in ASR that integrates WE and acoustic domain similarity (ADS). For WE, multi-target regression for error rate prediction (MTR-ER) is introduced, while ADS is incorporated as a selection criterion, measured using noise-contrastive estimation. The effectiveness of this approach is demonstrated through comparisons with a confidence-based method. Results show that combining WE and ADS achieves 26.66% of the expected performance improvement of fully supervised learning.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2025 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of Interspeech 2025 is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | speech recognition; semi-supervised learning; word error rate estimation; acoustic domain similarity |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 07 Aug 2025 15:03 |
Last Modified: | 20 Aug 2025 13:21 |
Published Version: | https://www.isca-archive.org/interspeech_2025/park... |
Status: | Published |
Publisher: | International Speech Communication Association (ISCA) |
Refereed: | Yes |
Identification Number: | 10.21437/Interspeech.2025-191 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230076 |