Hamoud, B and Atwell, E orcid.org/0000-0001-9395-3764 (2017) Evaluation corpus for restricted-domain question-answering systems for the holy Quran. International Journal of Science and Research, 6 (8). pp. 1133-1138. ISSN 2319-7064
Abstract
This paper presents the compilation of a corpus of question-answer pairs for the holy Quran. The corpus has been manually collected from a wide range of sources, and designed to represent the Quran Arabic-English Question and Answer Corpus (QAEQ&AC). QAEQ&AC is a written, bilingual corpus, which comprises Arabic and English text. First, question-answer pairs have been collected from several trusted expert sources. Then the data were merged and cleaned using Microsoft Excel. After that data were converted to the format that suitable for mining tools, where we have created a comma-separated value (CSV) file form at. The corpus obtained consists of more than 1500 question-answer pairs which is nearly 50.000 words, divided over Arabic and English languages. It includes different question types such as what, when, why, etc., and different answer length. We anticipate that the current and subsequent versions of our corpus will be a valuable evaluation resource for computational linguists investigating Quran question and answer; it might be used as a gold standard in researches, that dealing with natural language processing, information retrieval, artificial intelligence. The corpus can be subjected to an annotation to derive linguistic information such as morphological, syntactic, semantic, and lexical information.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This article is licensed under the terms of the Creative Commons Attribution License CC-BY [https://creativecommons.org/licenses/by/4.0/]. |
Keywords: | QAEQ&AC, Quran, corpus, data, question-answer pairs, dataset |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 08 Jan 2018 15:52 |
Last Modified: | 06 Jul 2018 10:18 |
Published Version: | https://www.ijsr.net/archive/v6i8/4081701.pdf |
Status: | Published |
Publisher: | International Journal of Science and Research |
Identification Number: | 10.21275/4081701 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:125920 |