The Impact of Non-Arabic Dominant Training Data on Distorting Arabic–English AI Translation

Abdulmalek Marwan  Ali; Raed Sabah  Qasim

doi:10.56961/mejlls.v4i1.1335

Vol. 4 No. 1 (2026), Articles

Vol. 4 No. 1 (2026)

The Impact of Non-Arabic Dominant Training Data on Distorting Arabic–English AI Translation

Articles

https://doi.org/10.56961/mejlls.v4i1.1335

Published 2026-02-22

Abdulmalek Marwan Ali⁺⁻
Raed Sabah Qasim⁺⁻

Abdulmalek Marwan Ali

Department of English Language, College of Education, Alnoor University

Raed Sabah Qasim

Department of English Language, College of Education, Alnoor University1&2

PDF

How to Cite

Ali, A. M. ., & Qasim, R. S. . (2026). The Impact of Non-Arabic Dominant Training Data on Distorting Arabic–English AI Translation. Manar Elsharq Journal for Literature and Language Studies, 4(1), 79–90. https://doi.org/10.56961/mejlls.v4i1.1335

Abstract

The purpose of this study is to investigate the distortions that non-Arabic dominant training data introduce Arabic to English AI translations. It claims that an uneven corpus of training data produces not just technical errors but also systematic errors in meaning, style, and usage. Pursuing a descriptive–analytical comparative methodology, the study explores the AI-generated translations of Arabic texts of various types and evaluates their comparison with human reference translations. an cause-and-effect relationship has been established between the dominance of non-Arabic data, and the repetitively diluted meanings, normalized styles, and misaligned pragmatics. The internalization of Anglophone language norms in AI models trained on non-Arabic data disproportionately creates these distortions. To ensure accuracy, cultural faithfulness, and fairness, the training and evaluation framework for Arabic–English AI translation must be linguistically fair, concludes the study.

https://doi.org/10.56961/mejlls.v4i1.1335

PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.

Most read articles by the same author(s)

Abdulmalek Marwan Ali, NLP and Computational Stylistics for Iraqi Literature and Media: Evidence Map (2023–2026; includes one diachronic study covering 1980–2025), Methodological Standards, and a Responsible Roadmap , Manar Elsharq Journal for Literature and Language Studies : Vol. 4 No. 2 (2026): Manar Elsharq Journal for Literature and Language Studies (MEJLLS)

Plum Analytics

Plum Analytics Artifact Widget

The Impact of Non-Arabic Dominant Training Data on Distorting Arabic–English AI Translation

How to Cite

Download Citation

Abstract

Most read articles by the same author(s)