Despite the momentum and recent interest in this area, interaction in the context of speaking tests is a relatively young domain. Not many studies have specifically aimed at interaction in assessment. The researcher, therefore, initially incorporated as many relevant studies as possible, including dissertations, conference proceedings, and unpublished research reports. Table 1 demonstrates that the overall effect size of published studies is substantially greater (d = 1.29) than unpublished ones (d = 0.80). This difference is also statistically significant as indicated by non-overlapping confidence intervals.
The distribution of effect sizes in the funnel plot (see Figure 1) further indicates a bias toward statistically significant results since there was a slightly higher number of effect sizes to the right of the unweighted average effect size (d = 0.99). The plots would be more equally weighted on both sides of the mean if no bias were present. Note that effect sizes of studies with larger sample sizes clustered largely to the left of the aggregated mean, while those with smaller samples were clustered to the right. This asymmetry in the funnel plot may be caused by other variables such as the moderators, which will be discussed in detail later.