A Comparative Study of ChatGPT and Claude AI in Evaluating CEFR-Based Arabic Writing

Hakiem, Marsekal Rahman; Faturrahman, Muhammad Irfan; Auliansyah, Ach. Afrian

doi:10.48311/LRR/lrr.2025.116362.0

A Comparative Study of ChatGPT and Claude AI in Evaluating CEFR-Based Arabic Writing

مقالات آماده انتشار

نوع مقاله : مقاله تحقیق

نویسندگان

Marsekal Rahman Hakiem

Muhammad Irfan Faturrahman

Ach. Afrian Auliansyah

10.48311/LRR/lrr.2025.116362.0

چکیده

This study aims to: 1) evaluate and analyse the performance of ChatGPT and Claude AI in assessing Arabic writing skills based on CEFR indicators; 2) present a comparison of the two AI platforms in terms of score accuracy, scoring consistency, and feedback quality in terms of strengths and weaknesses to provide practical recommendations for implementing AI in assessing Arabic writing skills. The method used is a comparative analysis of the scores the two AI models gave on 10 participant scripts, consisting of various CEFR levels. The evaluation results of both models were analysed using the Intraclass Correlation Coefficient (ICC) technique to measure the level of agreement between raters. In addition, this study also identifies the strengths and limitations of each model based on linguistic dimensions, cohesion, and relevance of content to the prompt. The findings indicate that ChatGPT and Claude AI have potential as tools for assessing texts in Arabic, but show variations in accuracy and consistency across CEFR levels. This study is expected to be an initial foundation in developing an AI-based automatic evaluation system for global Arabic language learning.

کلیدواژه‌ها

artificial intelligence

arabic language

CEFR

writing assessment

موضوعات