MUST: A MUltilingual Student-Teacher learning approach for low-resource speech recognition

This is the latest version of this eprint.

Farooq, M.U., Ahmad, R. orcid.org/0000-0002-0194-6653 and Hain, T. orcid.org/0000-0003-0939-3464 (2024) MUST: A MUltilingual Student-Teacher learning approach for low-resource speech recognition. In: Proceedings of 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 16-20 Dec 2023, Taipei, Taiwan. Institute of Electrical and Electronics Engineers (IEEE) , pp. 1-6. ISBN: 9798350306903 ISSN: 2997-6928 EISSN: 2997-6995

Abstract

Student-teacher learning or knowledge distillation (KD) has been previously used to address data scarcity issue for training of speech recognition (ASR) systems. However, a limitation of KD training is that the student model classes must be a proper or improper subset of the teacher model classes. It prevents distillation from even acoustically similar languages if the character sets are not same. In this work, the aforementioned limitation is addressed by proposing a MUltilingual Student-Teacher (MUST) learning which exploits a posteriors mapping approach. A pre-trained mapping model is used to map posteriors from a teacher language to the student language ASR. These mapped posteriors are used as soft labels for KD learning. Various teacher ensemble schemes are experimented to train an ASR model for low-resource languages. A model trained with MUST learning reduces relative character error rate (CER) up to 9.5% in comparison with a baseline monolingual ASR.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Farooq, M.U. Ahmad, R. https://orcid.org/0000-0002-0194-6653 Hain, T. https://orcid.org/0000-0003-0939-3464
Copyright, Publisher and Additional Information:	© 2023 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	multilingual; knowledge distillation; automatic speech recognition; low-resource languages
Dates:	Published (online): 19 January 2024 Published: 19 January 2024
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	08 Aug 2025 10:19
Last Modified:	08 Aug 2025 10:21
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Identification Number:	10.1109/asru57964.2023.10389636
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:230160

Available Versions of this Item

MUST: a multilingual student-teacher learning approach for low-resource speech recognition. (deposited 07 Aug 2025 16:07)
- MUST: A MUltilingual Student-Teacher learning approach for low-resource speech recognition. (deposited 08 Aug 2025 10:19) [Currently Displayed]

Download

Accepted Version

Filename: 2310.18865v1.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)