For the un Ivariate analysis, three G-study designs were used for the speaking data to estimate the variance components in the G-study: (a) a two-facet, partially nested design [(r:p) ×t] with tasks (t) and raters (r) as random facets; (b) a two-facet crossed design (p ×t × r′) with tasks (t) and ratings(r′) as random facets; and (c) a single-facet crossed design (p ×t) with tasks (t) as random facets that used averaged ratings over two raters as the unit of analysis. The first two were the two main comparison G-study designs used in this study to investigate the relative effects of tasks and rater stogether (see also the previous section, “Investigation of Score Dependability: Generalizability Theory,” for rationales for these two designs). However, the third design (p ×t) was used to estimate internal consistency reliability coefficients (αT) for different section lengths when the averaged ratings over two raters were used as units of analysis; thus, possible scores were 1.0, 1.5, …,4.5, 5.0. In a single-facet design, a Cronbach