Benchmark rating procedure, best of both worlds? Comparing procedures to rate text quality in a reliable and valid manner.

Renske Bouwer*, Monica Koster, Huub van den Bergh

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Assessing students’ writing performance is essential to adequately monitor and promote individual writing development, but it is also a challenge. The present research investigates a benchmark rating procedure for assessing texts written by upper-elementary students. In two studies we examined whether a benchmark rating procedure (1) leads to reliable and generalisable scores that converge with holistic and analytic ratings, and (2) can be used for rating texts varying in topic and genre. Results support evidence that benchmark ratings are a valid indicator of text quality as they converge with holistic and analytic scores. They are also associated with less rater variance and less task-specific variance, leading to reliable and generalisable ratings. Moreover, a benchmark scale can be used for rating different tasks with the same reliability, at least when texts are written in the same genre. Taken together, a benchmark rating procedure ensures meaningful and useful information on students’ writing.

Original languageEnglish
Pages (from-to)302-319
Number of pages18
JournalAssessment in Education: Principles, Policy and Practice
Volume30
Issue number3-4
DOIs
Publication statusPublished - 11 Aug 2023

Bibliographical note

Publisher Copyright:
© 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.

Funding

The work was supported by the Netherlands Organization for Scientific Research [411-11-859].

FundersFunder number
Nederlandse Organisatie voor Wetenschappelijk Onderzoek411-11-859

    Keywords

    • Writing assessment
    • benchmark rating procedure
    • generalisability
    • reliability
    • validity

    Fingerprint

    Dive into the research topics of 'Benchmark rating procedure, best of both worlds? Comparing procedures to rate text quality in a reliable and valid manner.'. Together they form a unique fingerprint.

    Cite this