The purpose of the study was to examine (a) the relative effects of tasks and raters on examinees’ speaking scores based on integrated and independent tasks, and (b) the impact of subsection lengths as well as the number of tasks and raters on the dependability of speaking scores in the G-theory framework. It was found that (a) the largest portion of error variance was related to tasks rather than raters in the study, (b) increasing the number of tasks had a relatively large impact on the score dependability up to a point of diminishing return, (c) the high universe score correlations among three sub sec tions provided justification for combining the task-type subsection scores into a single composite score, and (d) slightly larger gains in composite score reliability were achieved when the number of LS(listening-speaking) tasks was increased