Tests
A testis defined as an instrument or systematic procedure for observing and describing one or
more characteristics of a student using either a numerical scale or a classification scheme.
Test is a concept narrower than assessment.
In schools, we usually think of a test as a paper-and-pencil instrument with a series of questions that students must answer.
Teachers usually score these tests by adding together the “points” a student earned on each question.
By using tests this way, teachers describe the student using a numerical scale.
Similarly, a preschool child’s cognitive development could be observed by using the Wechsler Preschool and Primary Scale of Intelligence (see Chapter 19 ) and described as having a percentile rank of 50 (see Chapter 17 ).
Not all tests use numerical scales.
Others use systematic observation procedures to place students into categories.
Although it is natural to assume that tests are designed to provide information about an individual, this is not always true. States have testing programs designed to determine whether their schoolshave attained certain goals or standards.
Although these tests are administered to individual students, a state uses the results to measure the effectiveness of a school.
In such cases, individual names are not associated with scores when reporting to the government.
The “score” for the school system (or for a specific school at a specific grade level) is usually the percentage of the
school’s students who meet or exceed that state’s standards.
Another example of an assessment program designed to survey the educational system rather than individual students is the National Assessment of Educational Progress (NAEP) ( nces. ed.gov ).
The NAEP assesses the impact of the nation’s educational efforts by describing what students are able to do.
Assessment tasks are assigned to students on a random sampling basis so that not every student has the same or even
comparable tasks.
Thus, it is not meaningful to use the scores with individual students.
The assessment is intended to pool the results from all students in the sample to show the progress of education in the entire country.
The NAEP surveys are efficient ways to gather information about the average performance of a group of students because they assess each student using very few tasks, but pool the results to estimate the average.
However, this gain in efficiency of assessing the group comes at the expense of not being able to describe validly
the achievement of individual students.
Measurement
Measurementis defined as a procedure for assigning numbers (usually called scores) to a specified attribute or characteristic of a person in such a way that the numbers describe the degree to which the person possesses the attribute. An important feature of the number-assigning procedure in measurement is that the resulting scores maintain the order that exists in the real world among the people being measured.
At the minimum, this principle would mean, for example, that if you are a better speller than we are, a test that measures our spelling abilities should result in your score (your measurement) being higher
than ours.
For many of the characteristics measured in education and psychology, the number-assigning procedure is to count the correct answers or to sum points earned on a test. Alternatively, we may use a scale to rate the quality of a student’s
product (for example, an essay or a response to an open-ended mathematics task) or performance (how well the student carries out chemistry lab procedures). (See Chapter 13 for examples.) Most measurement specialists would probably agree that although a counting or rating procedure is crude, as a practical matter scores from assessments are useful when they are validated by using data from research ( Kane, 2006 ).
Thus an assessment may or may not provide measurements. If a procedure describes a student by qualitative labels or categories but not by numbers, the student is assessed, but not measured in the sense used here. Assessment is a broader term than test or measurement because not all types of assessments yield measurements.
Evaluation
Evaluation is defined as the process of making a value judgment about the worth of a student’s product or performance. For example, you may judge a student’s writing as exceptionally good for his grade placement. This evaluation may lead
you to encourage the student to enter a national essay competition. To make this evaluation, you would first have to assess his writing ability. You may gather information by reviewing the student’s journal, comparing his writing to that of
other students and to known quality standards of writing, and so on. Such assessments provide information you may use to judge the quality or worth of the student’s writing. Your judgment that the student’s writing is of high quality would
lead you to decide to encourage him to enter the competition. Evaluations are the bases for decisions about what course of action to follow.
Evaluation may or may not be based on measurements or test results. Among others, evaluations may be based on counting things, using checklists, or using rating scales. Clearly, evaluation does occur in the absence of tests, measurements, and other objective information. You can—and probably often do—evaluate students on the basis of assessments such as systematic observation and qualitative description, without measuring them. Even if objective information is available and used, evaluators must integrate it into their own experiences to come to decisions.
So degrees of subjectivity, inconsistency, and bias influence all evaluations. Testing and measurement, because they are more formal, standardized, and objective than other assessment techniques, reduce some of the inconsistency and subjectivity that influence evaluation. The general public, however, sometimes thinks that because numbers look objective they remove the element of judgment from evaluation; this is called the illusion of “mechanical objectivity” ( Porter, 1995 , p. 4 ).
Evaluation of Schools, Programs, or Materials
Not all evaluations are of individual students. You also can evaluate a textbook, a set of instructional materials, an instructional procedure, a curriculum, an educational program, or a school. Each of these things may be evaluated during
development as well as after they are completely developed. The terms formative and summativeevaluation are also used to distinguish the roles of evaluation during these two periods ( Cronbach, 1963 ; Scriven, 1967 ). Historically, these terms
arose first in the context of evaluation of schools or programs and were then applied to students.
The convention has become that “formative and summative evaluation” refers to schools, programs, or materials, and “formative and summative assessment” refers to students. We will follow that convention.
Formative evaluation of schools, programs, or materialsis judgment about quality or worth made during the design or development of instructional materials, instructional procedures, curricula, or educational programs. The evaluator uses these judgments to modify, form, or otherwise improve the school, program, or educational material. A teacher also engages in formative evaluation when revising lessons or learning materials based on information obtained
from their previous use.
Summative evaluation of schools, programs, or materialsis judgment about the quality or worth of schools, already-completed instructional materials, instructional procedures, curricula, or educational programs. Such evaluations tend to
summarize strengths and weaknesses; they describe the extent to which a properly implemented program or procedure has attained its stated goals and objectives. Summative evaluations appraise the effectiveness of a particular
educational product as well as under what conditions it is effective. Summative evaluations usually are directed less toward providing suggestions for improvement than are formative evaluations.
Evaluation of Students You may evaluate students for formative or summative purposes, as well. Classroom formative and summative assessments both should be based on the same intended learning outcomes. Figure 1.4 shows common uses for classroom assessment results. The uses are organized into two groups: formative and summative. One use of assessment, controlling students’ behavior, is not listed in Figure 1.4 because it is a poor, and sometimes unethical, practice. Controlling students through assessments turns a process of information gathering into a process of threatening and punishing with negative consequences for learning and self-efficacy.
Formative assessment of students’ achievementmeans judging the quality of a student’s achievement while the student is still in the process of learning. We make formative assessments of students to guide their next learning steps.
When you ask questions in class to see whether students understand the lesson, for example, you are obtaining information to formatively evaluate their learning. You can then adjust your lesson if students do not understand. Students participate in formative assessment as well, interpreting information about their own performances to adjust their
learning strategies ( Moss & Brookhart, 2009 ). Highquality formative assessment and feedback to students increase student learning ( Hattie & Timperley, 2007 ). In general, formative assessments are less formal than summative assessments. We recommend that you record the results of these assessments to help your memory; however, you
do not use them to report official letter grades or achievement progress.