Alfaifi, AYG, Atwell, E and Hedaya, I (2014) Arabic learner corpus (ALC) v2: a new written and spoken corpus of Arabic learners. In: Ishikawa, S, (ed.) Proceedings of Learner Corpus Studies in Asia and the World 2014. International Symposium Learner Corpus Studies in Asia and the World (LCSAW) 2014, May 31 - June 1, 2014, Kobe University, Japan. Kobe International Communication Center , 77 - 89.
Abstract
Arabic learner corpora have not received enough attention, particularly for learning Arabic as a second language (in Arabic speaking countries). Based on the literature, there are a few projects are developing Arabic learner corpora, of which most are not freely available for users or researchers. In addition to that they are intended to assist in the language acquisition of Arabic as a foreign language (collected from learners studying Arabic in non-Arabic speaking countries). The present paper aims to introduce the Arabic Learner Corpus. It is being developed at Leeds University, and comprises of 282,732 words, collected from learners of Arabic in Saudi Arabia. The corpus includes written and spoken data produced by 942 students, from 67 different nationalities studying at pre-university and university levels. The paper focuses on two angles of this corpus; the design criteria and the content. The design criteria of the ALC discuss the target language, the participants, the corpus size, the materials included, the method of data collection, the metadata of the corpus materials and contributors, and text distribution. The second part, ALC content, is illustrated based on 26 elements representing the corpus metadata. The goal of the ALC is to provide an open-source of data for some linguistic research areas related to Arabic language learning and teaching. So, the corpus data is available for download in TXT and XML formats, hand-written sheets which are in PDF format as well as the audio recordings which are available in MP3 format.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | (c) 2014, Alfaifi, AYG, Atwell, E and Hedaya, I. Reproduced with permission from the copyright holder. |
Keywords: | Arabic; learner; corpus; language; materials; data; text |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 11 Jul 2014 10:52 |
Last Modified: | 19 Jan 2018 08:53 |
Published Version: | http://www.lib.kobe-u.ac.jp/repository/81006691.pd... |
Status: | Published |
Publisher: | Kobe International Communication Center |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:79561 |