Unsupervised data selection for speech recognition with contrastive loss ratios

Park, C. orcid.org/0000-0001-6671-1671, Ahmad, R. orcid.org/0000-0002-0194-6653 and Hain, T. orcid.org/0000-0003-0939-3464 (2022) Unsupervised data selection for speech recognition with contrastive loss ratios. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 23-27 May 2022, Singapore, Singapore. . Institute of Electrical and Electronics Engineers (IEEE), pp. 8587-8591. ISBN: 9781665405416. ISSN: 1520-6149. EISSN: 2379-190X.

Abstract

This paper proposes an unsupervised data selection method by using a submodular function based on contrastive loss ratios of target and training data sets. A model using a contrastive loss function is trained on both sets. Then the ratio of frame-level losses for each model is used by a submodular function. By using the submodular function, a training set for automatic speech recognition matching the target data set is selected. Experiments show that models trained on the data sets selected by the proposed method outperform the selection method based on log-likelihoods produced by GMM-HMM models, in terms of word error rate (WER). When selecting a fixed amount, e.g. 10 hours of data, the difference between the results of two methods on Tedtalks was 20.23% WER relative. The method can also be used to select data with the aim of minimising negative transfer, while maintaining or improving on performance of models trained on the whole training set. Results show that the WER on the WSJCAM0 data set was reduced by 6.26% relative when selecting 85% from the whole data set.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Park, C. https://orcid.org/0000-0001-6671-1671 Ahmad, R. https://orcid.org/0000-0002-0194-6653 Hain, T. https://orcid.org/0000-0003-0939-3464
Copyright, Publisher and Additional Information:	© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy.
Keywords:	data selection; unsupervised; contrastive loss; submodular; speech recognition
Dates:	Published: 27 April 2022
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Date Deposited:	18 Jul 2025 15:09
Last Modified:	18 Jul 2025 15:09
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Identification Number:	10.1109/icassp43922.2022.9747390
Related URLs:	Author
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:229407

CORE (COnnecting REpositories)

Unsupervised data selection for speech recognition with contrastive loss ratios

Abstract

Metadata

Download

Accepted Version

Export

Statistics