Formative assessment

How assessment can be used to improve instruction

May 14, 2018
In a 1967 monograph, Michael Scriven (1967) suggested that it was useful to distinguish between two kinds of curriculum evaluation process. In the first, he suggested that evaluation “may have a role in the on-going improvement of the curriculum” (p. 41) while in the second, “the evaluation process may serve to enable administrators to decide whether the entire finished curriculum, refined by the use of the evaluation process in its first role, represents a sufficiently significant advance on the available alternatives to justify the expense of adoption by a school system.” (pp. 41-42) He also proposed that it would be worthwhile, “to use the terms ‘formative’ and ‘summative’ to qualify evaluation in these roles.”

 Although Scriven had intended that these terms be used only to apply to the evaluation of curricula, the following year, in his work on ‘mastery learning’ Benjamin Bloom (1968) extended the use of the terms ‘formative’ and ‘summative’ to the evaluation (i.e., assessment) of individual students. In a subsequent paper, he explained the difference:

Quite in contrast is the use of “formative evaluation” to provide feedback and correctives at each stage in the teaching-learning process. By formative evaluation we mean evaluation by brief tests used by teachers and students as aids in the learning process. While such tests may be graded and used as part of the judging and classificatory function of evaluation, we see much more effective use of formative evaluation if it is separated from the grading process and used primarily as an aid to teaching."  (Bloom, 1969)

While Bloom’s proposals about mastery learning were influential, and regular, frequent assessment was widely regarded as essential to effective instruction, the term ‘formative assessment’ was not widely used in primary and secondary schools. However, in higher education, and particularly in the UK, many universities introduced what they called “formative assessments” into their courses. These were typically assessments designed to mimic the assessments that students would take at the end of a course, and which allowed students to gauge their progress. While such assessments did, sometimes, provide insights into what a student might do to improve, the emphasis was on indicating the extent of progress towards an educational goal. More importantly, whether an assessment was described as formative or not depended primarily on its location in a sequence of instructional activities—“any assessment before ‘the big one’” (Wiliam, 2010, p. 36) as it were. While such assessments were also, sometimes, claimed to provide insights that might improve learning, there is little evidence that they did so.

Although the term “formative assessment” was not in widespread use, in the second half of the 1980s, a number of research reviews appeared that indicated that classroom evaluation processes could have a substantial positive—or negative—influence on learning. Some of these, such as reviews by Natriello (1987) and  Crooks (1988) focused more on the negative aspects of classroom assessments, particularly in terms of impact on motivation. Others, such as those by Bangert-Drowns, Kulik, Kulik, and Morgan (1991) and Bangert-Drowns, Kulik, and Kulik (1991) showed the substantial benefits of regular classroom testing, for long-term recall through the effects of priming, and what we would now call retrieval practice (Karpicke & Blunt, 2011). Still others, looked at the way that regular classroom assessment might support teachers in making instructional adjustments, in the way envisaged by Bloom. In particular, Fuchs and Fuchs (1986) found that when the results were used to adjust instruction, especially when teachers used a pre-determined rule to decide what to do in the case of a given assessment outcome, there was a large positive impact on student learning.

When, some years later, Paul Black and I sought to update these reviews (Black & Wiliam, 1998) we realized that using the term ‘formative’ to describe the position an assessment occupied in a course of study, or the assessment itself, represented what Gilbert Ryle (1949) called a “category mistake”—ascribing to something a property it cannot have—since the same assessment procedure could yield evidence that could be used summatively or formatively (Wiliam & Black, 1996). There is, therefore, no such thing as a formative assessment. There are, however, assessments whose results can be used formatively. If, following Cronbach (1971), we define an assessment as a procedure for drawing inferences, then we can use the terms formative and summative to describe the kinds of inferences that we make from assessment results. When the inferences we make are about an individual’s level of achievement, or her or his suitability for a particular programme of study, then the assessment is functioning summatively. When the inferences are about how to improve an individual’s learning then the assessment is functioning formatively.

This insight provides a clear basis for distinguishing between the terms “assessment for learning” and “formative assessment.” As defined by Black, Harrison, Lee, Marshall, and Wiliam (2004), assessment for learning is “any assessment for which the first priority in its design and practice is to serve the purpose of promoting students’ learning. It thus differs from assessment designed primarily to serve the purposes of accountability, or of ranking, or of certifying competence” (p. 10). Assessment for learning would therefore include the use of assessment to motivate students, or to provide retrieval practice. Such assessment becomes formative assessment when the evidence elicited by the assessment is interpreted and used to improve instructional decisions.

In the 20 years since Black and Wiliam’s review of the impact of classroom assessment processes on learning appeared, the evidence on the value of classroom assessment processes—especially if they focus on providing retrieval practice and supporting instructional adjustments—has accumulated (see chapter 4 of Wiliam, 2016 for a summary of the research). However, many issues remain unresolved. Some of the most significant of these include the magnitude of the effects of such assessment processes on student learning (Bennett, 2011; Kingston & Nash, 2011, 2015), the skills and knowledge that teachers need to effectively implement such assessment (Heitink, Van der Kleij, Veldkamp, Schildkamp, & Kippers, 2016), and the domain-specificity of formative assessment practices (Andrade, Bennett, & Cizek, 2018). Perhaps most significantly, little is known about the best ways of implementing such formative assessment practices at scale.

Further reading

