Alrabiah, M, Alhelewh, N, Al-Salman, A et al. (1 more author) (2014) An empirical study on the Holy Quran based on a large classical Arabic corpus. International Journal of Computational Linguistics (IJCL), 5 (1). 1 - 13. ISSN 2180-1266
Abstract
Distributional semantics is one of the empirical approaches to natural language processing and acquisition, which is mainly concerned by modeling word meaning using words distribution statistics gathered from huge corpora. Many distributional semantic models are available in the literature, but none of them have been applied so far to the Quran nor to Classical Arabic in general. This paper reports the construction of a very large corpus of Classical Arabic that will be used as a base to study distributional lexical semantics of the Quran and Classical Arabic. It also reports the results of two empirical studies; the first is applying a number of probabilistic distributional semantic models to automatically identify lexical collocations in the Quran and the other is applying those same models on the Classical Arabic corpus in an attempt to test their ability of capturing lexical collocations and co occurrences for a number of the corpus words. Results show that the MI.log_freq association measure achieved the highest results in extracting significant co-occurrences and collocations from small and large Classical Arabic corpora, while mutual information association measure achieved the worst results.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | (c) 2014, Alrabiah, M, Alhelewh, N, Al-Salman, A and Atwell, ES. This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial ShareAlike (CC BY-NC-SA 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, provided the original work is properly cited, the use is non-commercial and any derivative works are licensed under the same terms. |
Keywords: | Distributional lexical semantics; Quran; classical Arabic corpus; collocation extraction; association measures |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 08 Dec 2014 12:32 |
Last Modified: | 16 Jan 2018 21:22 |
Published Version: | http://www.cscjournals.org/journals/IJCL/issue-man... |
Status: | Published |
Publisher: | CSC Journals |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:81839 |