On the Accuracy of English Language Teachersâ Writing Assessment

Azizi, Masoud

doi:10.48311/LRR/lrr.2021.12524

On the Accuracy of English Language Teachersâ Writing Assessment

Document Type : Research article

Author

Masoud Azizi

Assistant Professor, Department of Foreign Languages, Amirkabir University of Technology, Tehran, Iran

10.48311/LRR/lrr.2021.12524

Abstract

In case not enough caution is exercised in the assessment of second or foreign language learners’ writing performance, one cannot trust the accuracy of decisions made accordingly. As experts or trained raters are often not available or it is not cost-effective to employ them in most educational contexts, writing assessment is often carried out by language instructors, who may not enjoy an adequate competence in teaching and assessing L2 writing. This makes the investigation of the accuracy of ratings done by language teachers a must. In so doing, 30 language teachers in three groups, each with a different background in teaching English and L2 writing, were selected, and their ratings of 30 IELTS samples were compared against those of expert raters using One-Way ANOVA tests. A statistically significant difference was found among the raters for the total writing score as well as the four components, with the L2 writing teachers demonstrating the closest performance to that of the expert rater and with language teachers with no or very little background in teaching L2 writing demonstrating the lowest accuracy. Moreover, the only significant correlations were found between the ratings done by the writing teachers and those of the expert rater, indicating that only they could interpret the scoring criteria not significantly different from the expert rater. The results demonstrate that language teachers are not generally suitable writing raters as they are affected by their own teaching background and understanding of the rating criteria.

Keywords

L2 writing

Assessment

rater training

teacher raters

rating accuracy

20.1001.1.23223081.1400.12.5.24.8

Subjects

Teacher Training

References
Attali, Y. (2015). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 1, 1-7. https://doi.org/10.1177/0265532215582283
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford University Press.
Baker, B. A. (2010). Playing with the stakes: A consideration of an aspect of the social context of a gate keeping writing assessment. Assessing Writing, 15, 133–153. https://doi.org/10.1016/j.asw.2010.06.002
Barkaoui, K. (2007). Participants, texts and processes in ESL/EFL Essay Tests: A Narrative Review of the Literature. Canadian Modern Language Review 64 (1), 99–134.
Barkaoui, K. (2010). Do ESL essay raters’ evaluation criteria change with experience? A mixed-method crosses-sectional study. TESOL Quarterly, 44(1), 31-57. https://doi.org/10.5054/tq.2010.214047
Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28, 51–75. https://doi.org/10.1177/0265532210376379
Brown, G., Glasswell, K., & Harland, D. (2004). Accuracy in the scoring of writing: Studies of reliability and validity using a New Zealand writing assessment system. Assessing Writing, 9, 105–121. https://doi.org/10.1016/j.asw.2004.07.001
Clauser, B. E. (2000). Recurrent issues and recent advances in scoring performance assessments. Applied Psychological, Measurement, 24(4), 310–324. https://doi.org/10.1177/01466210022031778
Crusan, D., Plakans, L., & Gebril, A. (2016). Writing assessment literacy: Surveying second language teachers’ knowledge, beliefs, and practices. Assessing Writing, 28, 43-56. https://doi.org/10.1016/j.asw.2016.03.001
Dornyei, Z. (2001). Motivational strategies in the language classroom. Cambridge: Cambridge University Press.
Dorneyi, Z. & Ushioda, E. (2011). Teaching and researching motivation (2nd ed.). Longman: Harlow.
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25, 155–185. https://doi.org/10.1177/0265532207086780
Engelhard, G., Jr., & Myford, C. M. (2003). Monitoring faculty consultant performance in the Advanced Placement English Literature and Composition Program with a many-faceted Rasch model (College Board Research Report No. 2003–1). College Entrance Examination Board.
Estaji, M. & Zhaleh, K. (2021). Exploring Iranian teachers’ perceptions of classroom justice and its dimensions in EFL instructional contexts. Language Related Research, 12(3). 277-314. https://doi.org/10.29252/LRR.12.3.10
Goodwin, S. (2016). A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes. Assessing Writing, 30, 21-31. https://doi.org/10.1016/j.asw.2016.07.004
Hirvela, A., & Belcher, D. (2007). Writing scholars as teacher educators: Exploring writing teacher education. Journal of Second Language Writing, 16 (3), 125-128. https://doi.org/10.1016/j.jslw.2007.08.001
Hodges, T. S., Wright, K. L., Wind, S. A., Matthews, S. D., Zimmer. W. K., McTigue, E. (2019). Developing and examining validity evidence for the Writing Rubric to Inform Teacher Educators (WRITE). Assessing Writing, 40, 1-13. https://doi.org/ 10.1016/j.asw.2019.03.001
Jolle, L. (2014). Pair assessment of pupil writing: A dialogic approach for studying the development of rater competence. Assessing Writing, 20, 37-52. https://doi.org/ 10.1016/j.asw.2014.01.002
Khatib, B. & Saeedian, A. (2021). Identifying and informing novice Iranian English language teachers’ classroom decision making and pedagogical reasoning regarding managerial mode. Language Related Research, 12(3). 121-149. https://doi.org/10.29252/LRR.12.3.5
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28, 543–560. https://doi.org/10.1177/0265532211406422
Martin, S. D., & Dismuke, S. (2018). Investigating differences in teacher practices through a complexity theory lens: The influence of teacher education. Journal of Teacher Education, 69(1), 22–39. https://doi.org/10.1177/0022487117702573
Mertler, C. (2009). Teachers’ assessment knowledge and their perceptions of the impact of classroom assessment professional development. Improving Schools, 12(1), 101-113. https://doi.org/10.1177/1365480209105575
Myers, J., Scales, R. Q., Grisham, D. L., Wolsey, T. D., Dismuke, S., Smetana, L., et al. (2016). What about writing? A national exploratory study of writing instruction in teacher preparation programs. Literacy Research and Instruction, 55(4), 309–330. https://doi.org/10.1080/19388071.2016.1198442
Shaw, S. & Weir, C. (2007). Examining writing: Research and practice in assessing second language writing (Vol. 26). Cambridge University Press.
Spear, M. (1997). The influence of contrast effects upon teachers’ marks. Educational Research, 39(2), 229-233. https://doi.org/10.1080/0013188970390209
Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment Research & Evaluation, 9 (4). 1-11. https://doi.org/10.7275/96jp-xz07
Suto, I. (2012). A critical review of some qualitative research methods used to explore rater cognition. Educational Measurement: Issues and Practice, 31, 21–30. https://doi.org/10.1111/j.1745-3992.2012.00240.x
Wang, J., Engelhard Jr G., Raczynski, K., Song, T., & Wolfe, E.W. (2017). Evaluating rater accuracy and perception for integrated writing assessments using a mixed-methods approach. Assessing Writing, 33, 36-47. https://doi.org/10.1111/j.1745-3984.1996.tb00479.x
Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6, 145–178. https://doi.org/10.1016/S1075-2935(00)00010-6
Weigle, S. C. (2002). Assessing writing. Cambridge University Press.
Weigle, S. C. (2007). Teaching writing teachers about assessment. Journal of Second Language Writing, 16(3), 194-209. https://dx.doi.org/10.1016/j.jslw.2007.07.004
Weigle, S. C. (2016). Second language writing assessment. In R. M. Manchón & P. K. Matsuda
De Gruyter Mouton (Eds.). Handbook of second and foreign language writing (pp. 473-494). De Gruyter Mouton.
Wind, S.A., & Engelhard, G. Jr. (2013). How invariant and accurate are domain ratings in writing assessment? Assessing Writing, 18, 278-299. https://doi.org/10.1016/j.asw.2013.09.002
Wiseman, C. S. (2012). Rater effects: Ego engagement in rater decision-making. Assessing Writing, 17, 150–173. https://doi.org/10.1016/j.asw.2011.12.001
Wolfe, E.W., & McVay, A. (2012). Application of latent trait models to identifying substantially interesting raters. Educational Measurement: Issues and Practices, 31(3), 31-37. https://doi.org/10.1111/j.1745-3992.2012.00241.x
Wolfe, E.W., Song, T., & Jiao, H. (2016). Features of difficult-to-score essays. Assessing Writing, 27, 1-10. http://dx.doi.org/10.1016/j.asw.2015.06.002
Zhu, W. (2004). Faculty views on the importance of writing, the nature of academic writing, and teaching and responding to writing in the disciplines. Journal of Second Language Writing, 13(1), 29-48. https://doi.org/10.1016/j.jslw.2004.04.004