The validity of speaking scoring rubric in Ferdowsi Persian Proficiency test

Document Type : مقالات علمی پژوهشی

Authors
1 PhD Candidate in Persian Language and Literature, Ferdowsi University, Mashhad, Iran
2 Assistant Professor of Persian Language and Literature, Ferdowsi University, Mashhad, Iran.
Abstract
The ability to speak is an important part of every body’s language proficiency. This ability plays an important role in the academic life of students. But scoring and assessing speaking is not easy. In this research, we try to study the validity of Ferdowsi University’s Persian proficiency test. We know that every test has a certain amount of error; but in scoring speaking ability if the scoring rubric is designed in a scientific way, the score attributed to the speakers' speech ability is likely to be very similar to their actual language ability. In other words, the appropriate scoring rubric can have a significant effect on reducing the error rate of the test. In norm-reference tests, this can be achieved only when test designers can say what scoring constructs they intend to measure and how successful they are in achieving that goal. Also, it should be clear whether the scoring scale can distinguish weak, medium, and strong test takers. On the other hand, in applying the scoring rubric , the level of consensus of the scorers should be clear. In order to see how successful is the scoring rubric in Ferdowsi Persian proficiency test, in measuring the test taker’s speaking ability, the authors analyzed the result of one of the proficiency tests administered at Ferdowsi University with Rasch model and factor analysis. The result showed that scorer reliability is 0.97 which is so high. It showed that scorers have the same understanding of the scoring rubric. This means that the scorers have given the test takers a relatively stable score, which is a strong point for the test. Also, the scores have used the scoring rubric properly because the cut score goes up in an organized way as the ability of test-takers increase. Each of the four thresholds obtained by the Rash statistical model differs by approximately 5 degrees, respectively. A regular increase in thresholds is commensurate with the ability of the test takers. This indicates a correct understanding of the scorers of the 5 grades specified in the scoring rubric; in other words, scorers have a good understanding of the level of competence of test takers and its relationship with the grades in the scoring rubric. The Wright map shoes that the scoring rubric can differentiate basic, intermediate and advanced test-takers well. Although on the top of the map there are 8 test-takers which there is no score for them that means the needs some higher scores for them. On the other hand, factor load for three constructs, delivery, language use and topic development are 0.74, 0.78 and 0.76. This shows that dividing speaking ability into these three constructs is proper while language use has the highest factor load and topic development has the lowest factor load.

Keywords

Subjects


جلیلی، سیّد اکبر. (1390). آزمون مهارتی فارسی (آمفا) بر پایۀ چهار مهارت اصلی زبانی. پایان‌نامۀ کارشناسی ارشد. دانشگاه علاّمه طباطبائی تهران، ایران.
گلپور، لیلا. (1394). طرّاحی و اعتباربخشی آزمون بسندگی زبان فارسی بر پایۀ چهار مهارت زبانی. پایان‌نامۀ دکتری. دانشگاه پیام نور مرکز، تهران، ایران.
Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.
Bachman, Lyle. (2004). Statistical Analysis for Language Assessment. New York: Cambridge University Press.
Brown, H. D., (2001). Teaching by Principles: An interactive Approach to Language Pedagogy. Englewood Cliff, NJ: Prentice Hall Regents.
Chapelle, C. A. Enright M. K., Jamieson, J. M. (2008). Building a validity argument for the test of English as a foreign language. New York: Routledge.
Debot, K (1992). “A bilingual production model: Levelt’s speaking model adapted”. Applied Linguistics, 13: 1-24.
Douglas, Dan. (1997). Testing Speaking Ability in Academic Context: Theoretical Considerations (TOEFL No. 8). Princeton, NJ: ETS.
Ellis, R. (1985). Understanding second language acquisition. Oxford: Oxford University Press.
Ellis, R. (Ed.). (1987). Second language acquisition in context. Englewood Cliffs, NJ: Prentice-Hall International.
Ellis, R. (1990). Instructed second language acquisition. Cambridge, MA: Basil Blackwell.
ETS: Educational Testing System. (2012). The official Guide to the TOEFL test. (4th ed). New York: Mc Graw Hill.
Fulcher, Glenn. (2014). Testing second language speaking. NY: Routledge.
Fulcher, G. & Davidson, F. (2007). Language testing and assessment and advanced resource Book. New York: Routledge.
Hadden, B. L. (1991). Teacher and non-teacher perceptions of second-language communication. Language Learning, 41, 1-24.
Hughes, Arthur (2003). Testing for language teachers. Cambridge: Cambridge University press.
Kane M. T. (1999). Validating measures of performance. . Educational Measurement, 18 (2), 5-17
Larsen-Freeman, D., & Long, M. (1991).An introduction to second language acquisition research. New York: Longman.
Levelt, W. (1989). Speaking: From intention to articulation. Cambridge: MA: The MIT Press.
Linacre, J. M. (2009). WINSTEPS Rasch Measurement [Computer program]. Chicago, IL: Winsteps.
Lissitz, R. W. (ed.), (2009). The concept of validity: revisions new directions and applications. Charlotte, NC: Information Age Publishing, INC.
Ludwig, J. (1982). Native-speaker judgments of second language learners’ efforts at communication: A review. Modern Language Journal, 66,274-283
Messick, S. (1987). Validity (Report no. RR-87-40). Princeton: ETS.
Pishghadam, R., Shams, M.A. (2013). A new look into the construct validity of the IELTS Speaking module. The journal of teaching language skills. 5(1). Pp. 71-99.
Tarone, E. (1983). On the variability of interlanguage systems. Applied Linguistics, 4, 143-63.