Alshammari, I.K. orcid.org/0000-0002-7619-373X, Atwell, E. and Alsalka, M.A. (Cover date: December 2023) Topic Modeling for Hadith Corpus: A Comparison of Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), and BERTopic with AraBERT, XLM-R, MARBERT, and CAMeLBERT. International Journal on Islamic Applications in Computer Science And Technology, 11 (4). pp. 9-16. ISSN 2289-4012
Abstract
The primary source of Islamic law, following the Holy Qur'an, is the collection of authentic Hadith attributed to the prophet of God, peace be upon him (PBUH). The status of the prophet's Hadith is evident in its being an explanation of the Qur'an and its abstract topics. With that, this research presents different topic modeling techniques to examine their performance on the authentic Hadith. Topic modeling is the process of clustering documents and words automatically in a textual domain. LDA and NMF are the most widely used topic modeling techniques. BERTopic is a modern technique based on BERT using pre-trained transformer-based language models for topic modeling. This study aims to apply the topic modeling approaches to the "Matn" part of the authentic Hadith. Then, we compare the performance of BERTopic using state-of-the-art pre-trained Arabic language models to LDA and NMF approaches. We finally evaluate the topic coherence of topic modeling methods using normalized pointwise mutual information (NPMI). The findings of this study indicate that the BERTopic model outperforms the LDA and NMF techniques in terms of overall performance.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 11 Apr 2024 10:00 |
Last Modified: | 11 Apr 2024 10:00 |
Published Version: | http://www.sign-ific-ance.co.uk/index.php/IJASAT/a... |
Status: | Published |
Publisher: | Design for Scientific Renaissance |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:211346 |