Al-Shareef, S. and Hain, T. orcid.org/0000-0003-0939-3464 (2016) Colloquialising modern standard Arabic text for improved speech recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Interspeech 2016, 08-12 Sep 2016, San Francisco, USA. , pp. 1345-1349.
Abstract
Modern standard Arabic (MSA) is the official language of spoken and written Arabic media. Colloquial Arabic (CA) is the set of spoken variants of modern Arabic that exist in the form of regional dialects. CA is used in informal and everyday conversations while MSA is formal communication. An Arabic speaker switches between the two variants according to the situation. Developing an automatic speech recognition system always requires a large collection of transcribed speech or text, and for CA dialects this is an issue. CA has limited textual resources because it exists only as a spoken language, without a standardised written form unlike MSA. This paper focuses on the data sparsity issue in CA textual resources and proposes a strategy to emulate a native speaker in colloquialising MSA to be used in CA language models (LMs) by use of a machine translation (MT) framework. The empirical results in Levantine CA show that using LMs estimated from colloquialised MSA data outperformed MSA LMs with a perplexity reduction up to 68% relative. In addition, interpolating colloquialised MSA LMs with a CA LMs improved speech recognition performance by 4% relative.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2016 ISCA. This is an author produced version of a paper subsequently published in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Colloquial Arabic; dialectical Arabic; language modelling; transfer learning; machine translation |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Dec 2016 15:52 |
Last Modified: | 19 Dec 2022 13:35 |
Published Version: | http://doi.org/10.21437/Interspeech.2016-788 |
Status: | Published |
Refereed: | Yes |
Identification Number: | 10.21437/Interspeech.2016-788 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:109283 |