The Impact of Non-Arabic Dominant Training Data on Distorting Arabic–English AI Translation
PDF

How to Cite

Ali, A. M. ., & Qasim, R. S. . (2026). The Impact of Non-Arabic Dominant Training Data on Distorting Arabic–English AI Translation. Manar Elsharq Journal for Literature and Language Studies, 4(1), 79–90. https://doi.org/10.56961/mejlls.v4i1.1335

Abstract

The purpose of this study is to investigate the distortions that non-Arabic dominant training data introduce Arabic to English AI translations. It claims that an uneven corpus of training data produces not just technical errors but also systematic errors in meaning, style, and usage. Pursuing a descriptive–analytical comparative methodology, the study explores the AI-generated translations of Arabic texts of various types and evaluates their comparison with human reference translations. an cause-and-effect relationship has been established between the dominance of non-Arabic data, and the repetitively diluted meanings, normalized styles, and misaligned pragmatics. The internalization of Anglophone language norms in AI models trained on non-Arabic data disproportionately creates these distortions. To ensure accuracy, cultural faithfulness, and fairness, the training and evaluation framework for Arabic–English AI translation must be linguistically fair, concludes the study.

https://doi.org/10.56961/mejlls.v4i1.1335
PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2026 Abdulmalek Marwan Ali, Raed Sabah Qasim