Alosaimy, A and Atwell, E orcid.org/0000-0001-9395-3764 (2018) Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic Taggers. International Journal of Computational Linguistics, 9 (1). pp. 1-12. ISSN 2180-1266
Abstract
We present and compare three methods of alignment between morphemes resulting from four different Arabic POS - taggers as well as one baseline method using only provided labels. We combined four Arabic POS - taggers: MADAMIRA (M A), Stanford Tagger (ST), AMIRA (AM), Farasa (FA); and as the target output used two Classical Arabic gold standards: Quranic Arabic Corpus (QAC) and SALMA Standard Arabic Linguistics Morphological Analysis (SAL). We justify why we opt to use label for aligning instead of word form. The problem is not trivial as it is tackling six different tokenisation and labelling standards. The supervised learning using a unigram model scored the best segment alignment accuracy, correctly aligning 97 % of morpheme segments. We then evaluated the alignment methods extrinsically, in terms of their effect in improving accuracy of ensemble POS - taggers, merging different combinations of the four Arabic POS - taggers. Using the best approach to align input POS taggers, ensemble tagger has correctly segmented and tagged 88.09% of morphemes. We show how increasing the number of input taggers raise the accuracy, suggesting that input taggers make different errors.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This is an open access article under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
Keywords: | Arabic, POS - Tagging, Segmentation, Tokenisation, Morphological Alignment |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Funding Information: | Funder Grant number EPSRC EP/K015206/1 |
Depositing User: | Symplectic Publications |
Date Deposited: | 03 Jan 2018 16:33 |
Last Modified: | 23 Jun 2023 22:41 |
Published Version: | http://www.cscjournals.org/manuscript/Journals/IJC... |
Status: | Published |
Publisher: | CSC Journals |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:125568 |