Hollands, S. orcid.org/0000-0002-3017-2423, Blackburn, D. orcid.org/0000-0001-8886-1283 and Christensen, H. orcid.org/0000-0003-3028-5062 (2022) Evaluating the performance of state-of-the-art ASR systems on non-native English using corpora with extensive language background variation. In: Interspeech 2022: Proceedings of the Annual Conference of the International Speech Communication Association. Interspeech 2022, 18-22 Sep 2022, Incheon, Korea. Interspeech Proceedings . International Speech Communication Association (ISCA) , pp. 3958-3962.
Abstract
This investigation is an exploration into the performance of several different ASR systems in dealing with non-native English using corpora with extensive language background variation. This study takes two corpora amounting to 191 different native language (L1) backgrounds and looks at how these systems are able to process non-native English (L2) speech. A transformer based ASR system and a CRDNN architecture are both tested, trained on Librispeech [1] and Commonvoice [2] for a three way cross comparison. In addition Google's Speech-to-Text API and AWS Transcribe were investigated in order to evaluate popular mainstream approaches given their current degree of impact in deployed systems. Experiments reveal deficits in the range of 10%-15% mean WER performance difference between L1 and L2 speech. Results indicate ASR systems trained on particular varieties of L2 speech may be effective in improving WERs with outcomes in this paper demonstrating several Google ASR models trained on varieties of African L2 English outperforming L1 trained ASR for under-represented dialect groups in the United Kingdom. Further research is proposed to explore the plausibility of this approach and to critically approach WER as a metric for ASR evaluation, striving instead towards metrics with greater emphasis on evaluating language for communication.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2022 International Speech Communication Association. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | non-native speech recognition; equality diversity and inclusion |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > School of Medicine and Population Health |
Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council 2431571 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 26 Jan 2024 11:14 |
Last Modified: | 26 Jan 2024 11:14 |
Status: | Published |
Publisher: | International Speech Communication Association (ISCA) |
Series Name: | Interspeech Proceedings |
Refereed: | Yes |
Identification Number: | 10.21437/interspeech.2022-10433 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:207725 |