Alghamdi, AAO, Atwell, E orcid.org/0000-0001-9395-3764 and Brierley, C (2016) An empirical study of Arabic formulaic sequence extraction methods. In: Proceedings of the LREC 2016. LREC 2016 10th International Conference on Language Resources and Evaluation, 23-28 May 2016, Portorož, Slovenia. , pp. 502-506. ISBN 978-2-9517408-9-1
Abstract
This paper aims to implement what is referred to as the collocation of the Arabic keywords approach for extracting formulaic sequences (FSs) in the form of high frequency but semantically regular formulas that are not restricted to any syntactic construction or semantic domain. The study applies several distributional semantic models in order to automatically extract relevant FSs related to Arabic keywords. The data sets used in this experiment are rendered from a new developed corpus-based Arabic wordlist consisting of 5,189 lexical items which represent a variety of modern standard Arabic (MSA) genres and regions, the new wordlist being based on an overlapping frequency based on a comprehensive comparison of four large Arabic corpora with a total size of over 8 billion running words. Empirical n-best precision evaluation methods are used to determine the best association measures (AMs) for extracting high frequency and meaningful FSs. The gold standard reference FSs list was developed in previous studies and manually evaluated against well-established quantitative and qualitative criteria. The results demonstrate that the MI.log_f AM achieved the highest results in extracting significant FSs from the large MSA corpus, while the T-score association measure achieved the worst results.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | (c) 2016, The European Language Resources Association. The LREC 2016 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License |
Keywords: | Arabic, Formulaic Sequence, Association Measures |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 19 Jul 2016 14:54 |
Last Modified: | 19 Jul 2016 14:54 |
Published Version: | http://www.lrec-conf.org/proceedings/lrec2016/inde... |
Status: | Published |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:102639 |