Alosaimy, A and Atwell, E orcid.org/0000-0001-9395-3764 (2017) Sunnah Arabic Corpus: Design and Methodology. In: Proceedings of the 5th International Conference on Islamic Applications in Computer Science and Technologies (IMAN 2017). IMAN 2017, 26-28 Dec 2017, Semarang, Indonesia.
Abstract
Sunnah Arabic Corpus is an annotated linguistic resource that consists of 144K words/170K tokens of the Hadith narratives (an utterance attributed to prophet Mohammed) extracted from Riyāḍu Aṣṣāliḥīn book. As a first layer of annotation, the corpus has been fully diacritized. In addition, each orthographic word/token is segmented into its syntactic words. And each syntactic word is tagged with its part-of-speech in addition to multiple morphological features. Several hadith translations in different languages are provided and aligned at the narrative/paragraph level. Hadith Arabic Corpus follows the successful Quranic Arabic Corpus in its standards (corpus.quran.com). Sunnah Arabic Corpus is freely available under the Creative Commons Attribution-ShareAlike 4.0 International License.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This is an author produced version of the paper 'Sunnah Arabic Corpus: Design and Methodology', presented at IMAN 2017. |
Keywords: | Arabic, corpus, annotation, Hadith, Sunnah, morphology |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 03 Jan 2018 16:06 |
Last Modified: | 22 Mar 2018 07:13 |
Status: | Published |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:125569 |