The papers in this issue address the general question of how to
add value to educational assessments, particularly in terms of
student growth in academic disciplines. In addressing this goal, the
papers focus on several recent and emerging model-based
methodologies: in particular, learning progressions and cognitive
models of learning (Pellegrino, this issue; de la Torre & Minchen, this
issue), evidence-centered design (ECD) as a framework for
assessment design and development (Zieky, this issue), and
cognitively based assessment of, for, and as learning (Deane & Song,
this issue; van Rijn et al., this issue).
These model-based methodologies involve major developments
in how we interpret assessment results and, therefore, they have
strong implications for how we evaluate the psychometric quality of
the assessments. The model-based interpretations of each student’s
assessment results involve relatively complex descriptions of each
student’s achievement emphasizing the student’s overall level of
sophistication as specified by a list of skills mastered and not
mastered (de la Torre & Minchen, this issue), or by a level in a learning
progression (Pellegrino, this issue), rather than the student’s standing
on a unidimensional scale (or on several scales). The goal is to
develop assessments that promote learning by providing information
that is useful in teaching and learning, and to generate evidence that
supports the proposed interpretation and usefulness of the
assessment results.
Our main point in this paper is that while grounding assessment
design in cognitive theories and model-based methodologies is
highly desirable, rigorous evaluation of the resulting scores is still
necessary. Specifically, the basic definition of validity in terms of the
extent to which the interpretation and use of test scores is supported
by appropriate evidence and analysis does not need to change.
However, as discussed in more detail later, the structure of the
arguments used to support the proposed interpretations and uses of
the scores and the evidence needed to evaluate these arguments will
need to be adapted to fit the proposed interpretations and uses of
the test results. Similarly, the analyses of the precision, or reliability,
of the results will need to be reconsidered; for example, to the extent
that the focus is on placement in a learning progression rather than
on a score on a continuous scale, analyses of precision would focus
on consistency of placement (in the progression), rather than on
traditional reliability or generalizability coefficients.