Sawalha, M, Atwell, ES and Abushariah, M (2013) SALMA: Standard Arabic Language Morphological Analysis. In: Proceedings ICCSPA International Conference on Communications, Signal Processing, and their Applications. ICCSPA International Conference on Communications, Signal Processing, and their Applications, 12-14 Feb 2013, Sharjah, UAE. IEEE , 1 - 6. ISBN 9781467328203
Abstract
Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. This paper reviews the SALMA-Tools (Standard Arabic Language Morphological Analysis) [1]. The SALMATools is a collection of open-source standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis – particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA – Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior-knowledge broadcoverage lexical resources; the SALMA – ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA – Tag Set is a standard tag set for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Keywords: | Morphological analysis; tag set; fine-grain; traditional Arabic grammar; traditional Arabic Lexicons |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 02 Dec 2014 15:15 |
Last Modified: | 19 Dec 2022 13:29 |
Published Version: | http://dx.doi.org/10.1109/ICCSPA.2013.6487311 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1109/ICCSPA.2013.6487311 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:81623 |