By using the three-tier scoring system for the notes and the detail tests, we were able to determine an overall score indicating how many details were recorded in notes and on the recall test, as well as specific values indicating the quality of answers (or complete absence of answers) for each detail. While related, the specific failure and excellent values provided an additional level of information in our analysis not present in other note taking studies. Thus, for both note taking and the free recall test, we recorded the overall score for details, the number of failures, and the number of excellent/perfectly worded details.
A subset of 10% of the total participant materials (N=47) was randomly chosen to test consistency between the coders. Both coders graded the notes and free recall test from this subset independently, and their coding sheets were then used for calculating intercoder reliability. After addressing minor differences between the two coders, Cohen's kappa was calculated for overall details recorded in students' notes and for details noted on the free-recall test. For note details Cohen's kappa was .84, and for recall test details Cohen's kappa was .77. Landis and Koch (1977) note that kappa values between .61 and .80 can be considered to have substantial agreement. In addition, percent agreement statistics were calculated and indicated that the coders agreed nearly 95% of the time. The KR-20 for the multiple-choice test was .524. Although this reliability estimate is lower than desired, it should be noted that the formula includes error introduced by the various experimental conditions in this study. That variance in students' scores, coupled with the relatively small number of questions, likely means that this is an underestimate of the actual consistency of the test.