1. What would be the impact of increasing the number of tasks from 1 to 12?
2. What would be the impact of increasing the number of ratings per essay from 1 to 2?
3. Are the universe (or true) score correlations among the three speaking subsections high
enough to justify combining them into a single composite score?
4. What combinations of task-type subsection lengths for fixed total lengths(e.g., 5 tasks)
would maximize the composite score reliability for speaking?