Hamoud, B and Atwell, ES orcid.org/0000-0001-9395-3764 (2016) Quran question and answer corpus for data mining with WEKA. In: 2016 Conference of Basic Sciences and Engineering Studies (SGCAC). 2016 Conference of Basic Sciences and Engineering Studies (SGCAC), 20-23 Feb 2016, Khartoum, Sudan. IEEE , pp. 211-216. ISBN 978-1-5090-1811-6
Abstract
This paper presents the compilation of a holy Quran question and answer dataset corpus, created for data mining with Waikato Environment for Knowledge Analysis (WEKA). Questions and answers from the Quran were collected from multiple data sources, and then a representative sample of the question and answers were selected to be used in our model. Then the data was cleaned to improve data quality to the level required by the WEKA tool, and then converted to a comma separated value (CSV) file format to provide a suitable corpus dataset that can be loaded into WEKA. Then StringToWordVector filter was used to process each string into a bag or vector of word frequencies for further analysis with different data mining techniques. After that we applied a clustering algorithm to the processed attributes, and show the WEKA cluster visualizer.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | (c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. |
Keywords: | Data mining, WEKA, dataset, Corpus, Quran |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Funding Information: | Funder Grant number EPSRC EP/K015206/1 |
Depositing User: | Symplectic Publications |
Date Deposited: | 17 Jun 2016 11:44 |
Last Modified: | 15 Nov 2016 21:26 |
Published Version: | http://dx.doi.org/10.1109/SGCAC.2016.7458032 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1109/SGCAC.2016.7458032 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:101066 |