Alosaimy, AMS and Atwell, ES (2015) A review of morphosyntactic analysers and tag-sets for Arabic corpus linguistics. In: Corpus Linguistics 2015. Corpus Linguistics 2015, 21-24 Jul 2015, Lancaster, UK. , pp. 16-19.
Abstract
Geoffrey Leech applied his expertise in English grammar to development of Part-of-Speech tagsets and taggers for English corpora, including LOB and BNC tagsets and tagged corpora. He also developed EAGLES standards for morphosyntactic tag-sets and taggers for European languages. We have extended this line of research to Arabic: we present a review of morphosyntactic analysers and tag-sets for Arabic corpus linguistics. The field of Arabic NLP has received a lot of contributions in the last decades. Many analysers handle its morphological-rich problem in Modern Standard Arabic text, and at least there are six freely available morphological analyzers at the time of writing this paper. However, the choice between these tools is challenging. In this extended abstract, we will discuss the outputs of these different tools. We show the challenge of comparing between them. The goal of this abstract is not to evaluate these tools but to show the differences. We aim also to ease the building of an infrastructure that can evaluate every tool based on common criteria and produce a universal pos-tagging.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Keywords: | morphosyntactic; arabic; part of speech; morphological analysers |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 25 Apr 2016 14:32 |
Last Modified: | 25 Apr 2016 14:44 |
Published Version: | http://ucrel.lancs.ac.uk/cl2015/doc/CL2015-Abstrac... |
Status: | Published |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:94413 |