Alshammari, I.K. orcid.org/0000-0002-7619-373X, Atwell, E. and Alsalka, M.A. (Cover date: March 2023) Evaluation of Arabic Named Entity Recognition Models on Sahih Al-Bukhari Text. International Journal on Islamic Applications in Computer Science And Technology, 11 (1). pp. 1-8. ISSN 2289-4012
Abstract
In this paper, the following four Arabic named entity recognition (ANER) models were applied to the Sahih Al-Bukhari (صحيح البخاري) dataset: CAMeLBERT-CA Hatmimoha, Marefa-NER, and Stanza. This study's main aim is to identify the best-performing model for use with other Hadith datasets. The Stanza and Marefa-NER models are best because they obtained F1-scores of 0.826191 and 0.807396, respectively. Then, a new test dataset of approximately 5,000 words was created based on the CANERCorpus annotation. The four models were evaluated using the latest test dataset and had disappointing F1-scores, although Hatmimoha had the best results. This problem likely arose as a result of the small dataset. However, we observed that since the model has many named entity classes and matches the CANERCorpus labels, it could obtain a high performance, as the Hatmimoha and Marefa-NER models did.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 11 Apr 2024 10:03 |
Last Modified: | 11 Apr 2024 10:03 |
Published Version: | http://www.sign-ific-ance.co.uk/index.php/IJASAT/a... |
Status: | Published |
Publisher: | Design for Scientific Renaissance |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:211347 |