Al-Sulaiti, L, Abbas, N, Brierley, C et al. (2 more authors) (2016) Compilation of an Arabic Children’s Corpus. In: LREC 2016 Proceedings. LREC 2016: 10th Language Resources and Evaluation Conference, 23-28 May 2016, Portorož, Slovenia. ISBN 978-2-9517408-9-1
Abstract
Inspired by the Oxford Children's Corpus, we have developed a prototype corpus of Arabic texts written and/or selected for children. Our Arabic Children's Corpus of 2950 documents and nearly 2 million words has been collected manually from the web during a 3-month project. It is of high quality, and contains a range of different children's genres based on sources located, including classic tales from The Arabian Nights, and popular fictional characters such as Goha. We anticipate that the current and subsequent versions of our corpus will lead to interesting studies in text classification, language use, and ideology in children's texts.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This is an author produced version of a paper published in LREC 2016 Proceedings. The LREC 2016 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License |
Keywords: | Arabic, Children's Corpus, Genre Classification |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Funding Information: | Funder Grant number EPSRC EP/K015206/1 |
Depositing User: | Symplectic Publications |
Date Deposited: | 20 Jun 2016 11:25 |
Last Modified: | 17 Jan 2018 22:21 |
Published Version: | http://www.lrec-conf.org/proceedings/lrec2016/inde... |
Status: | Published |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:100839 |