Alshammeri, M, Atwell, E orcid.org/0000-0001-9395-3764 and Alsalka, MA orcid.org/0000-0003-3335-1918 (2020) Quranic Topic Modelling Using Paragraph Vectors. In: Advances in Intelligent Systems and Computing. 2020 Intelligent Systems Conference (IntelliSys), 03-04 Sep 2020, Online. Springer Verlag , pp. 218-230. ISBN 978-3-030-55186-5
Abstract
The Quran is known for its linguistic and spiritual value. It comprises knowledge and topics that govern different aspects of people’s life. Acquiring and encoding this knowledge is not a trivial task due to the overlapping of meanings over its documents and passages. Analysing a text like the Quran requires learning approaches that go beyond word level to achieve sentence level representation. Thus, in this work, we follow a deep learning approach: paragraph vector to learn an informative representation of Quranic Verses. We use a recent breakthrough in embeddings that maps the passages of the Quran to vector representation that preserves more semantic and syntactic information. These vectors can be used as inputs for machine learning models, and leveraged for the topic analysis. Moreover, we evaluated the derived clusters of related verses against a tagged corpus, to add more significance to our conclusions. Using the paragraph vectors model, we managed to generate a document embedding space that model and explain word distribution in the Holy Quran. The dimensions in the space represent the semantic structure in the data and ultimately help to identify main topics and concepts in the text.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © Springer Nature Switzerland AG 2021. This is an author produced version of an article published in Advances in Intelligent Systems and Computing. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Holy Quran; Semantic analysis; Distributional representation; Topic modeling; Deep learning; Document embedding; Paragraph vector |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 14 Jul 2020 09:12 |
Last Modified: | 25 Aug 2021 00:38 |
Status: | Published |
Publisher: | Springer Verlag |
Identification Number: | 10.1007/978-3-030-55187-2_19 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:163016 |