CONCLUSIONS
Results suggest that the EBRAS can effectively mediate and structure all four steps in a cycle of assessment (see Figure 3), facilitating a valid and reliable evaluation of how students reason with evidence, an important aspect of science inquiry skills. This study showed that meaningful definitions of three proficiencies related to scientific reasoning (conceptual sophistication, specificity, and validity) are possible and that modestly trained raters can be relied upon to consistently interpret and apply these definitions to written classroom work.
An important design feature of the EBRAS is its ability to focus on and reveal multiple important aspects of students’ scientific argumentation that are difficult to observe in traditional science assessments and classroom discourse. In basing the EBRAS on the EBR Framework (Brown et al., this issue), deliberate care was taken to probe different components and processes of reasoning and to ensure that both domain-specific and domain-general aspects of students’ use of evidence were addressed. We discovered that the interactions between latent proficiency, items design, and the components and processes of the EBR Framework are complex but largely understandable. In general, the EBRAS constructs reveal stable proficiencies that exert a strong influence on student responses across the items we developed. An important implication of this is that individual EBRAS items provide more information about student proficiency than items for which there is an objectively correct answer. This makes EBRAS items more reliable in formative assessment contexts where a single response is often all that can be evaluated.
Despite the strong influence of the proficiencies measured by the EBRAS, aspects of items design do impact what kind of information an item is particularly effective at revealing. For example, the conceptual sophistication of a response is influenced by what data are focused on and how they are represented in the item stem. Of particular note is the observation that misconceptions are revealed only by items that omit normative scientific terms (e.g., mass, volume, density) and their associated data, regardless of whether the items concern everyday objects or complicated hypothetical combinations of different materials. The specificity of responses is influenced by the reasoning component the item is designed to probe. In general, items eliciting rules or pieces of evidence provide more information about specificity than items eliciting premises or claims. An important finding is that the relative proficiency of students is best revealed by items that are situated in unfamiliar contexts; familiar contexts, such as objects floating in water with the values of properties represented numerically, increase the use of shorthand language in which units are dropped and exact values are implied rather than explicitly stated. The validity of responses is influenced by the reasoning process the item is designed to probe. Items targeting analysis or interpretation provide more information about validity than items targeting application. These items are particularly useful when they are designed to present situations in which valid reasoning is particularly difficult, such as extrapolating beyond the range of data or dealing with counterevidence.
Two aspects of students’ written work often attended to by teachers, having the right idea and using correct scientific terminology, have been shown to: (a) each confound the proficiencies associated with the EBRAS constructs; (b) share a positive dependence on conceptual sophis-tication; and (c) due to their dependence on the EBRAS constructs, both positively predict accuracy on traditional forced-choice science assessment items. These findings help to explain why the evaluation of students’ scientific reasoning by teachers relying on these two aspects, having the right idea and using correct scientific terminology, has been, in our experience, contentious and difficult.
We hope the EBRAS will serve as a valuable resource for assessment developers: a frame-work that guides the design and implementation of targeted assessments that elicit valid and reliable evidence of the multiple proficiencies underlying scientific reasoning.
ACKNOWLEDGMENTS
This material is based on work supported by the National Science Foundation under Grant 0439062. Any opinions, findings, and conclusions or recommendations expressed in this ma-terial are those of the authors and do not necessarily reflect the views of the National Science Foundation.