Alosaimy, A and Atwell, E orcid.org/0000-0001-9395-3764 (2017) Sunnah Arabic Corpus: Design and Methodology. In: Proceedings of the 5th International Conference on Islamic Applications in Computer Science and Technologies (IMAN 2017). IMAN 2017, 26-28 Dec 2017, Semarang, Indonesia.
Abstract
Sunnah Arabic Corpus is an annotated linguistic resource that consists of 144K words/170K tokens of the Hadith narratives (an utterance attributed to prophet Mohammed) extracted from Riyāḍu Aṣṣāliḥīn book. As a first layer of annotation, the corpus has been fully diacritized. In addition, each orthographic word/token is segmented into its syntactic words. And each syntactic word is tagged with its part-of-speech in addition to multiple morphological features. Several hadith translations in different languages are provided and aligned at the narrative/paragraph level. Hadith Arabic Corpus follows the successful Quranic Arabic Corpus in its standards (corpus.quran.com). Sunnah Arabic Corpus is freely available under the Creative Commons Attribution-ShareAlike 4.0 International License.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | This is an author produced version of the paper 'Sunnah Arabic Corpus: Design and Methodology', presented at IMAN 2017. |
| Keywords: | Arabic, corpus, annotation, Hadith, Sunnah, morphology |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Depositing User: | Symplectic Publications |
| Date Deposited: | 03 Jan 2018 16:06 |
| Last Modified: | 22 Mar 2018 07:13 |
| Status: | Published |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:125569 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)