Such analyses have proven to be highly reliable and have provided important information about students’ thinking
and reasoning in previous studies. To ensure a high reliability of both quantitative and qualitative scoring for the purpose
of this study, 60 students’ testing booklets were randomly selected (30 Chinese booklets and 30 U.S. booklets) and were
independently coded by two raters, who are literate in both Chinese and English. The inter-rater agreements for the
cognitive analysis ranged from 87% to 100%. The inter-rater agreements for holistic scoring ranged from 84% to 91%.