Iakovenko, O. orcid.org/0000-0002-7801-6585 and Hain, T. orcid.org/0000-0003-0939-3464 (2024) Methods of automatic matrix language determination for code-switched speech. In: Al-Onaizan, Y., Bansal, M. and Chen, Y.-N., (eds.) Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), 12-16 Nov 2024, Miami, Florida, USA. Association for Computational Linguistics , pp. 5791-5800. ISBN: 9798891761643
Abstract
Code-switching (CS) is the process of speakers interchanging between two or more languages which in the modern world becomes increasingly common. In order to better describe CS speech the Matrix Language Frame (MLF) theory introduces the concept of a Matrix Language, which is the language that provides the grammatical structure for a CS utterance. In this work the MLF theory was used to develop systems for Matrix Language Identity (MLID) determination. The MLID of English/Mandarin and English/Spanish CS text and speech was compared to acoustic language identity (LID), which is a typical way to identify a language in monolingual utterances. MLID predictors from audio show higher correlation with the textual principles than LID in all cases while also outperforming LID in an MLID recognition task based on F1 macro (60%) and correlation score (0.38). This novel approach has identified that non-English languages (Mandarin and Spanish) are preferred over the English language as the ML contrary to the monolingual choice of LID.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2024 Association for Computational Linguistics. This paper is made available under a Creative Commons Attribution 4.0 International License. (https://creativecommons.org/licenses/by/4.0/) |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 06 Aug 2025 12:29 |
Last Modified: | 06 Aug 2025 12:29 |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Identification Number: | 10.18653/v1/2024.emnlp-main.330 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230159 |