Results
The preliminary comparison of data from Japan and Korea revealed some betweensample
differences. The Korean sample scored slightly higher on average (by approximately
a quarter of a standard deviation) on each of the TOEIC measures than did the
Japanese sample. Korean test takers were also slightly less variable on each measure.
With respect to the self-assessments, Korean test takers also rated themselves slightly
higher on average on each assessment than did Japanese test takers. These differences
were roughly commensurate in size, and consistent in direction, with the differences
between TOEIC scores for the two samples.
More importantly, the intercorrelations among the independent variables in our
regression analyses (i.e., scores from the four TOEIC tests) were very similar for the two
samples, ranging from .52 to .73 for the Korean sample (median r = .60) and from .58 to
.74 for the Japanese sample (median r = .62). The correlation between each TOEIC score
and the corresponding self-assessment for its domain was slightly higher in each case for
the Japanese sample than for the Korean sample. As a result, pooling the samples resulted
in correlations for the combined sample that were approximately midway between those
of the separate country samples. Thus, this preliminary analysis did not, we believe,
reveal any between-sample differences that were substantial enough to contra-indicate
pooling the data.
Each of the four six-item self-assessments proved to be highly reliable. Cronbach’s
alpha internal consistency estimates were in the mid .90s. Internal consistency reliability
estimates in the low .90s have been reported elsewhere for the TOEIC Reading and
Listening tests, and test–retest correlations in the low to mid .80s have been reported for
the TOEIC Speaking and Writing measures (Liao, Qu, & Morgan, 2010).
Table 1 shows the intercorrelations among TOEIC test scores and test takers’ selfassessments
in each domain. Each column in the table shows the correlations of a particular
TOEIC test score with each of the four self-assessment measures. The correlations
on the diagonal, mostly in the .40s, indicate significant moderate prediction of selfassessments
from corresponding TOEIC scores. The off-diagonal correlations are nearly
as strong, suggesting little discrimination among either the self-assessments and/or the
TOEIC measures. Thus, these zero-order correlations suggest relatively weak, if any,
discriminant validity for any of the TOEIC measures, at least for the self-assessment
criteria that we have used. This is especially noticeable for the TOEIC Writing measure,
which is the least reliable of the four measures.
Table 1 also contains the means and standard deviations of TOEIC scores for the
study sample. We note that the sample is substantially more proficient with respect to
Listening and Reading scores (M= 423 and 373) than is the corresponding worldwide
sample (M=312 and 257) (www.ets.org/s/toeic/pdf/2012_ww_data_report_unlweb.pdf).
The range of proficiency in the study sample is also more restricted (SD = 63 and 78)
than in the worldwide sample (SD = 97 and 104). Thus, the correlations for our sample
can be expected to be lower than those in the general TOEIC population. Although no