Tarmom, T orcid.org/0000-0002-2834-461X, Atwell, E orcid.org/0000-0001-9395-3764 and Alsalka, MA (2020) Non-authentic Hadith Corpus: Design and Methodology. International Journal on Islamic Applications in Computer Science And Technology, 8 (3). pp. 13-19. ISSN 2289-4012
Abstract
The primary religious text of Islam is the Quran. The Hadith—the second source—refers to any action, saying, order or silent approval of the holy prophet Muhammad that has been delivered through a chain of narrators. Each Hadith has an Isnad—the chain of narrators—and a Matan—the act of the Prophet Muhammad. In contrast to the Quran, some Hadiths, which have been handed down over the centuries, have been corrupted by narrators who were not competent in transferring them. These have been classified by Hadith scholars as a non-authentic Hadith (NAH). To evaluate different classifiers regarding the automatic classification of Arabic Hadith, it was necessary to build Arabic Hadith corpora that contained samples of authentic and non-authentic Hadith, which were used for training models and testing. This paper aimed to create a new NAH corpus which consists of 452,624 words from six different Hadith books. The subsequent aim is to annotate this corpus to determine some Hadith features such as the Isnad, the Matan and the Hadith authenticity and to provide a ground truth.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Keywords: | Hadith Corpus, Non-authentic Hadith, Arabic, Natural Language Processing, Corpus Linguistics |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 21 Oct 2020 11:03 |
Last Modified: | 02 Nov 2020 12:07 |
Published Version: | http://www.sign-ific-ance.co.uk/index.php/IJASAT/a... |
Status: | Published |
Publisher: | Design for Scientific Renaissance |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:166861 |