Alahmari, S. orcid.org/0009-0002-6490-3295, Atwell, E. orcid.org/0000-0001-9395-3764, Alsalka, M. orcid.org/0000-0003-3335-1918 et al. (1 more author) (2025) Evaluating the Performance of LLMs When Translating Saudi Arabic as Low Resource Language. In: Artificial Intelligence XLI: 44th SGAI International Conference on Artificial Intelligence, AI 2024, Cambridge, UK, December 17–19, 2024, Proceedings, Part II. 44th SGAI International Conference on Artificial Intelligence (AI 2024), 17-19 Dec 2024, Cambridge, UK. Lecture Notes in Computer Science, 15447. Springer Nature, Cham, Switzerland, pp. 264-269. ISBN: 9783031779176. ISSN: 0302-9743. EISSN: 1611-3349.
Abstract
This paper evaluates the performance of different large language models (LLMs) in translating textual data from Saudi Arabic, a low-resource language, into English. In this investigation we employ the state-of-the-art language models namely; ChatGPT-4, Claude-3 and Palm-2. We assess the capabilities of these LLMs on the Arabic Semantic Textual Similarity (STS) dataset. The evaluation covers different aspects, including the standard evaluation metrics, prompt design, and comparison with baselines systems namely; Google Translator, QuillBot Translator and Systran Translator. We conducted human evaluation on the generated translation and analysis the most frequent translation error using our sample dataset and different models. Our findings reveal significant insights into the strengths of ChatGPT (GPT-4) model in handling and translating dialectal Arabic with the highest Bilingual Evaluation Understudy (BLEU) score among all participated models (46.56).
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-77918-3_22. |
| Keywords: | ChatGPT; Claude; Palm; Large Language Models; Evaluation of AI Systems; Machine Translation; Saudi Arabic Dialect |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Date Deposited: | 26 Nov 2025 10:22 |
| Last Modified: | 29 Nov 2025 01:30 |
| Published Version: | https://link.springer.com/chapter/10.1007/978-3-03... |
| Status: | Published |
| Publisher: | Springer Nature |
| Series Name: | Lecture Notes in Computer Science |
| Identification Number: | 10.1007/978-3-031-77918-3_22 |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:234853 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)