Constant Comparative Approach
In analyzing student responses, we followed the model presented by Creswell (2007) for a grounded theory constant
comparative research approach. This approach has been used in many studies on student conceptual understanding
(Bailey 2006, Sharma et al. 2004, Asghar and Libarkin 2010), and it has proven to be a reliable method. The central
idea to constant comparative research is an open-coding format where the researcher collects data and allows categories
to emerge naturally until the information gained is saturated. In constructing categories of student responses to
each question, we took an approach similar to what Sharma et al. (2004) refer to as a phenomenographic analysis,
which de-emphasizes correctness in favor of probing for a complete description of student ideas.
One author compiled student responses by typing each response into a single document and giving each response a
number tag so that individual cases could be tracked. In an exploratory fashion, the student responses were read,
reread, and rearranged to determine the emergent themes for each question. Once the general themes, or categories,
were determined, each author followed the constant comparative method by separately coding the numbered student
responses into the determined categories. To illustrate how these codings were compiled and analyzed, consider an
example of ten student responses (#1 – #10) to a question that has two conceptual categories, ‘a’ and ‘b.’ Table 1
shows how each rater in this example coded each student response. The raters agreed that four responses (N = 4) fit
in Category ‘a’ and three responses (N = 3) fit in Category ‘b.’ The raters did not agree on the categorization of two
responses (#5 and #9). For example, Rater 1 coded response #5 in Category ‘b’ while Rater 2 coded it in Category
‘a.’ Also note that both raters deemed response #2 inapplicable to the question.
To measure the degree to which our two codings of student responses about gravity agreed, we used a matrix
similar to Table 1 to calculate the Cohen’s Kappa inter-rater reliability statistic, j, for each question. Inter-rater
reliabilities greater than 0.80 are generally accepted as good agreement (Landis and Koch 1977). If the categorizations
for any question resulted in a j below 0.80, discussions and clarifications were made in an iterative process
until acceptable agreement was reached, but no explicit information was exchanged as to which responses
each researcher placed in each category.