. Most notably, and in
science even more than in some other school subjects, achievement tests are topic and course-specific, and the majority of
studies use achievement measures specifically constructed for the task at hand. As a consequence, the quality of instruments
is hard to assess, and in many studies, the experimenter-made measures are more aligned with the content taught in the
experimental condition than with the content in the control condition (e.g., inquiry skills are needed on the test, or the
questions are posed in context-rich format). On the other hand, one may argue with some ground that standardized tests are
often more aligned with the outcomes of the traditional curriculum. Slavin and Madden (2011), focusing on mathematics and
reading studies reviewed in the U.S. Department of Education's WhatWorks Clearinghouse (WWC), found that measures that
are “inherent” to the treatment (covering content not taught in the control group) are associated with effect sizes that are
much higher compared to measures of the curriculum taught in experimental as well as control groups (d ¼ 0.45 vs.
d ¼ 0.03). By contrast, Schroeder et al. (2007) in their meta-analysis, where 47 out of 62 achievement tests were locally
constructed, found no difference in outcome (d ¼ 0.73 and d ¼ 0.75 resp.).