Tarmom, T, Atwell, E orcid.org/0000-0001-9395-3764 and Alsalka, M (2019) Non-authentic Hadith Corpus: Design and Methodology. In: Proceedings of IMAN 2019. International Conference on Islamic Applications in Computer Science and Technologies (IMAN 2019), 27-28 Dec 2019, Kuala Lumpur, Malaysia.
Abstract
The primary religious text of Islam is the Quran. The Hadith—the second source—refers to any action, saying, order or silent approval of the holy prophet Muhammad that has been delivered through a chain of narrators. Each Hadith has an Isnad—the chain of narrators—and a Matan—the act of the Prophet Muhammad. In contrast to the Quran, some Hadiths, which have been handed down over the centuries, have been corrupted by narrators who were not competent in transferring them. These have been classified by Hadith scholars as a non-authentic Hadith (NAH). To evaluate different classifiers regarding the automatic classification of Arabic Hadith, it was necessary to build Arabic Hadith corpora that contained samples of authentic and non-authentic Hadith, which were used for training models and testing. This paper aimed to create a new NAH corpus which consists of 452,624 words from six different Hadith books. The subsequent aim is to annotate this corpus to determine some Hadith features such as the Isnad, the Matan and the Hadith authenticity and to provide a ground truth.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | Reproduced in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 15 Jan 2020 14:09 |
Last Modified: | 20 Feb 2020 14:48 |
Published Version: | http://dsr.edu.my/iman/papers.html |
Status: | Published online |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:155642 |