This is the latest version of this eprint.
Sawalha, MS and Atwell, ES (2010) Constructing and using broad-coverage lexical resource for enhancing morphological analysis of Arabic. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10). Language Resource and Evaluation Conference LREC 2010, 17 May 2010 - 23 May 2010, Valleta, Malta. European Language Resources Association (ELRA) , 282 - 287 (6). ISBN 2-9517408-6-7
Abstract
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy and the performance of NLP applications. We are constructing a broad-coverage lexical resource to improve the accuracy of morphological analyzers and part-of-speech taggers of Arabic text. Over the past 1200 years, many different kinds of Arabic language lexicons were constructed; these lexicons are different in ordering, size and aim or goal of construction. We collected 23 machine-readable lexicons, which are freely available on the web. We combined lexical resources into one large broad-coverage lexical resource by extracting information from disparate formats and merging traditional Arabic lexicons. To evaluate the broad-coverage lexical resource we computed coverage over the Qur’an, the Corpus of Contemporary Arabic, and a sample from the Arabic Web Corpus, using two methods. Counting exact word matches between test corpora and lexicon scored about 65-68%; Arabic has a rich morphology with many combinations of roots, affixes and clitics, so about a third of words in the corpora did not have an exact match in the lexicon. The second approach is to compute coverage in terms of use in a lemmatizer program, which strips clitics to look for a match for the underlying lexeme; this scored about 82-85%.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 03 Dec 2014 11:02 |
Last Modified: | 23 Jan 2018 05:27 |
Published Version: | http://www.lrec-conf.org/proceedings/lrec2010/pdf/... |
Status: | Published |
Publisher: | European Language Resources Association (ELRA) |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82341 |
Available Versions of this Item
-
Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic. (deposited 15 Nov 2010 18:13)
- Constructing and using broad-coverage lexical resource for enhancing morphological analysis of Arabic. (deposited 03 Dec 2014 11:02) [Currently Displayed]