Sawalhaa, M, Brierley, C, Atwell, E orcid.org/0000-0001-9395-3764 et al. (1 more author) (2017) Text Analytics and Transcription Technology for Quranic Arabic. International Journal on Islamic Applications in Computer Science and Technology (IJASAT), 5 (2). pp. 45-51. ISSN 2289-4012
Abstract
Natural Language Processing Working Together with Arabic and Islamic Studies is a 2-year project funded by the UK Engineering and Physical Sciences Research Council (EPSRC) to study prosodic-syntactic mark-up in the Quran (Atwell et al 2013). Tajwīd or correct Quranic recitation is very important in Islam. The original insight informing this project is to view tajwīd mark-up in the Quran as additional text-based data for computational analysis. This mark-up is already incorporated into Quranic Arabic script, and identifies phrase boundaries of different strengths, plus lengthened syllables denoting prosodically and semantically salient words. We have developed a grapheme-phoneme mapping scheme (Brierley et al 2016), plus state-of-the-art software (Sawalha et al 2014) for generating a stressed and syllabified phonemic transcription or citation form for each word in the entire text of the Quran, using the International Phonetic Alphabet (IPA). This canonical pronunciation tier for Classical Arabic is informed and evaluated by Arabic linguists, tajwīd scholars, and phoneticians, and published in an open-source Boundary-Annotated Quran corpus and machine learning dataset (ibid). We utilise statistical techniques such as keyword extraction to explore semiotic relationships between sound and meaning in the Quran, invoking a Saussurean-type view of the sign as ‘...a bi-unity of expression and content...’ (Dickins 2007). Our investigation entails: (i) text data mining for statistically significant phonemes, syllables, words, and correlates of rhythmic juncture; and (ii) interpretation of results from interdisciplinary perspectives: Corpus Linguistics; tajwīd science; Arabic Linguistics; and Phonetics and Phonology.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Keywords: | tajwīd; prosody; phonemic transcription; phrase boundary |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Funding Information: | Funder Grant number EPSRC EP/K015206/1 |
Depositing User: | Symplectic Publications |
Date Deposited: | 02 Aug 2017 13:04 |
Last Modified: | 05 Oct 2017 15:39 |
Published Version: | http://www.sign-ific-ance.co.uk/index.php/IJASAT/a... |
Status: | Published |
Publisher: | Design for Scientific Renaissance (DSR) |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:119716 |