A Comparative Study of ChatGPT and Claude AI in Evaluating CEFR-Based Arabic Writing

Document Type : Research article

Authors
1 Universitas Madani
2 Universitas Ahmad Dahlan
3 State Islamic University of Maulana Malik Ibrahim
10.48311/LRR/lrr.2025.116362.0
Abstract
This study aims to: 1) evaluate and analyse the performance of ChatGPT and Claude AI in assessing Arabic writing skills based on CEFR indicators; 2) present a comparison of the two AI platforms in terms of score accuracy, scoring consistency, and feedback quality in terms of strengths and weaknesses to provide practical recommendations for implementing AI in assessing Arabic writing skills. The method used is a comparative analysis of the scores the two AI models gave on 10 participant scripts, consisting of various CEFR levels. The evaluation results of both models were analysed using the Intraclass Correlation Coefficient (ICC) technique to measure the level of agreement between raters. In addition, this study also identifies the strengths and limitations of each model based on linguistic dimensions, cohesion, and relevance of content to the prompt. The findings indicate that ChatGPT and Claude AI have potential as tools for assessing texts in Arabic, but show variations in accuracy and consistency across CEFR levels. This study is expected to be an initial foundation in developing an AI-based automatic evaluation system for global Arabic language learning.

Keywords

Subjects



Articles in Press, Accepted Manuscript
Available Online from 21 November 2025