Extraction of Multi-Word Terms and Complex Terms from the Classical Arabic Text of the Quran

Abstract

The identification of domain-specific terms is a crucial step in many natural language processing applications. Term extraction is a process of obtaining a set of terms that represent the domain of a given text. The majority of term extraction research projects conducted for the Quran have used translated text instead of the original Classical Arabic text of the Quran. The extraction of terms from the original Arabic text rather than a translation may help in retrieving more relevant terms, due to the lack of Islamic equivalents of some Quran terms in other languages. This paper demonstrates a hybrid-based method for the acquisition of a list of domain-specific terms from the Arabic text of the Quran. The produced list of terms was validated using a common evaluation metric for ranked list; precision of up to 0.81 was achieved for the top 200 terms. We discuss the precision that was achieved, in the context of two existing datasets from previous research.

Metadata

Item Type:	Article
Authors/Creators:	Alrehaili, SM https://orcid.org/0000-0002-4957-2478 Atwell, E https://orcid.org/0000-0001-9395-3764
Editors:	Khedher, MZ
Keywords:	Quran terms; automatic term recognition; term extraction
Dates:	Accepted: 3 September 2017 Published (online): 25 September 2017 Published: 25 September 2017
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds)
Depositing User:	Symplectic Publications
Date Deposited:	21 Nov 2017 11:48
Last Modified:	05 Aug 2019 09:55
Status:	Published
Publisher:	Design for Scientific Renaissance
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:124245

CORE (COnnecting REpositories)

Extraction of Multi-Word Terms and Complex Terms from the Classical Arabic Text of the Quran

Abstract

Metadata

Download

External copy

Export

Statistics