Alshutayri, A orcid.org/0000-0001-8550-0597, Atwell, ES orcid.org/0000-0001-9395-3764, Alosaimy, A et al. (3 more authors) (2016) Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts. In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2016). VarDial 2016, 12 Dec 2016, Osaka, Japan. , pp. 204-211.
Abstract
This paper describes an Arabic dialect identification system which we developed for the Discriminating Similar Languages (DSL) 2016 shared task. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. We experimented with several classifiers and the best accuracy was achieved using the Sequential Minimal Optimization (SMO) algorithm for training and testing process set to three different feature-sets for each testing process. Our approach achieved an accuracy equal to 42.85% which is considerably worse in comparison to the evaluation scores on the training set of 80-90% and with training set 60:40 percentage split which achieved accuracy around 50%. We observed that Buckwalter transcripts from the Saarland Automatic Speech Recognition (ASR) system are given without short vowels, though the Buckwalter system has notation for these. We elaborate such observations, describe our methods and analyse the training dataset.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This work is licensed under a Creative Commons Attribution 4.0 International Licence. Licence details: http://creativecommons.org/licenses/by/4.0/ |
Keywords: | Arabic, Language, WEKA, Machine Learning, Dialect, Classifier, Automatic Speech Recognition, Transcript |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures & Societies (Leeds) > Arabic & Middle Eastern Studies (Leeds) The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 14 Nov 2016 15:27 |
Last Modified: | 05 Oct 2017 15:09 |
Published Version: | http://web.science.mq.edu.au/~smalmasi/vardial3/pd... |
Status: | Published |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:107396 |