The item difficulties of two mathematics scales can be approximated by a two-dimensional
Rasch model (Bond & Fox, 2001). This model allowed us to construct parallel test versions (with no item overlap) for each scale (here, performance in solving modelling or intramathematical
problems) at two measurement points (pre and posttest) so that students’ achievements could be compared. Students’ performance in solving modelling problems and intra-mathematical problems were examined using 18 and 8 items, respectively. As each student solved similar but not identical items at pretest and at posttest, students’ performance could be measured accurately, and memorisation effects were minimised. The ConQuest software (Wu, Adams, & Wilson, 1998) was used to scale students’ performance data.Weighted likelihood estimator (WLE) parameters (Warm, 1989) were estimated for each student. The WLEs characterise students’ performance using continuous scales for modelling and for intra-mathematical performance