The equation above provides an unbiased estimator of the squared standard error of measurement for a person with an observed score of x on a test with n items (see Lord, 1984).
All but one of the lines in Figure 1 are rather smooth. The irregular line portrays the Thorndike
estimates; it starts on the left with a dramatic dip, jumps around in a jagged manner, and eventually settles
into something resembling a curve after we pass the center of the score intervals. (Note: the Thorndike estimates go above a standard error of 3.00 for two of the score intervals; the other methods with estimates above 3.00 for some of the intervals are Feldt’s, and the binomial error model. The line corresponding to estimates derived from the three-parameter item response model starts at the left at just under the 2.50 standard error line.)
A limitation of the Qualls-Payne study relates to sample size. Less than 400 students were tested.
The sample sizes for several of the score intervals were meager; the first three intervals had n’s of less than 20, while the next four intervals were each populated by less than 40 test takers. Thorndike’s procedure returned a poor “curve” in Figure 1, but there is evidence which shows that itcan produce stable estimates given adequate sample size. For example, Figure 2 displays a graph of results from Table 1 of Feldt (1984).