For the data on examinations and other assessments, a heterogeneity
analysis indicated that average effect sizes were lower
when the outcome variable was an instructor-written course examination
as opposed to performance on a concept inventory
(Fig. 3A and Table S1B; Q = 10.731, df = 1, P