Alosaimy, A and Atwell, E orcid.org/0000-0001-9395-3764 (2017) Sunnah Arabic Corpus: Design and Methodology. In: Proceedings of the 5th International Conference on Islamic Applications in Computer Science and Technologies (IMAN 2017). IMAN 2017, 26-28 Dec 2017, Semarang, Indonesia.
Abstract
Sunnah Arabic Corpus is an annotated linguistic resource that consists of 144K words/170K tokens of the Hadith narratives (an utterance attributed to prophet Mohammed) extracted from Riyāḍu Aṣṣāliḥīn book. As a first layer of annotation, the corpus has been fully diacritized. In addition, each orthographic word/token is segmented into its syntactic words. And each syntactic word is tagged with its part-of-speech in addition to multiple morphological features. Several hadith translations in different languages are provided and aligned at the narrative/paragraph level. Hadith Arabic Corpus follows the successful Quranic Arabic Corpus in its standards (corpus.quran.com). Sunnah Arabic Corpus is freely available under the Creative Commons Attribution-ShareAlike 4.0 International License.
Metadata
| Item Type: | Proceedings Paper | 
|---|---|
| Authors/Creators: | 
 | 
| Copyright, Publisher and Additional Information: | This is an author produced version of the paper 'Sunnah Arabic Corpus: Design and Methodology', presented at IMAN 2017. | 
| Keywords: | Arabic, corpus, annotation, Hadith, Sunnah, morphology | 
| Dates: | 
 | 
| Institution: | The University of Leeds | 
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) | 
| Depositing User: | Symplectic Publications | 
| Date Deposited: | 03 Jan 2018 16:06 | 
| Last Modified: | 22 Mar 2018 07:13 | 
| Status: | Published | 
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:125569 | 

 CORE (COnnecting REpositories)
 CORE (COnnecting REpositories) CORE (COnnecting REpositories)
 CORE (COnnecting REpositories)