Meghanani, A. orcid.org/0000-0002-0811-274X and Hain, T. orcid.org/0000-0003-0939-3464 (2024) Improving acoustic word embeddings through correspondence training of self-supervised speech representations. In: Graham, Y. and Purver, M., (eds.) Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024), 17-22 Mar 2024, St. Julian’s, Malta. Association for Computational Linguistics , pp. 1959-1967. ISBN 9798891760882
Abstract
Acoustic word embeddings (AWEs) are vector representations of spoken words. An effective method for obtaining AWEs is the Correspondence Auto-Encoder (CAE). In the past, the CAE method has been associated with traditional MFCC features. Representations obtained from self-supervised learning (SSL)-based speech models such as HuBERT, Wav2vec2, etc., are outperforming MFCC in many downstream tasks. However, they have not been well studied in the context of learning AWEs. This work explores the effectiveness of CAE with SSL-based speech representations to obtain improved AWEs. Additionally, the capabilities of SSL-based speech models are explored in cross-lingual scenarios for obtaining AWEs. Experiments are conducted on five languages: Polish, Portuguese, Spanish, French, and English. HuBERT-based CAE model achieves the best results for word discrimination in all languages, despite HuBERT being pre-trained on English only. Also, the HuBERT-based CAE model works well in cross-lingual settings. It outperforms MFCC-based CAE models trained on the target languages when trained on one source language and tested on target languages.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2024 Association for Computational Linguistics. Licensed on a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). |
Keywords: | Cognitive and Computational Psychology; Language, Communication and Culture; Psychology |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 Jul 2025 16:45 |
Last Modified: | 18 Jul 2025 16:46 |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Identification Number: | 10.18653/v1/2024.eacl-long.118 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:229425 |