Alnefaie, S., Atwell, E. and Ammar Alsalka, M. (2023) HAQA and QUQA: Constructing two Arabic Question-Answering Corpora for the Quran and Hadith. In: Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Processings. International Conference Recent Advances in Natural Language Processing, 04-06 Sep 2023, Varna, Bulgaria. INCOMA Ltd., Shoumen, BULGARIA , pp. 90-97. ISBN 978-954-452-092-2
Abstract
It is neither possible nor fair to compare the performance of question-answering systems for the Holy Quran and Hadith Sharif in Arabic due to both the absence of a golden test dataset on the Hadith Sharif and the small size and easy questions of the newly created golden test dataset on the Holy Quran. This article presents two question–answer datasets: Hadith Question–Answer pairs (HAQA) and Quran Question–Answer pairs (QUQA). HAQA is the first Arabic Hadith question–answer dataset available to the research community, while the QUQA dataset is regarded as the more challenging and the most extensive collection of Arabic question–answer pairs on the Quran. HAQA was designed and its data collected from several expert sources, while QUQA went through several steps in the construction phase; that is, it was designed and then integrated with existing datasets in different formats, after which the datasets were enlarged with the addition of new data from books by experts. The HAQA corpus consists of 1598 question–answer pairs, and that of QUQA contains 3382. They may be useful as gold–standard datasets for the evaluation process, as training datasets for language models with question-answering tasks and for other uses in artificial intelligence.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | ACL materials are Copyright © 1963–2023 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 19 Dec 2023 14:35 |
Last Modified: | 19 Dec 2023 14:35 |
Published Version: | https://aclanthology.org/2023.ranlp-1.10/ |
Status: | Published |
Publisher: | INCOMA Ltd., Shoumen, BULGARIA |
Identification Number: | 10.26615/978-954-452-092-2_010 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:206720 |