The disparity in number of trials between the different tasks also makes it difficult to compare the different tasks in terms of their reliability. While there is a relationship between the number of trials in a task and the reliability of that task, tasks with similar numbers of trials (such as the n-back tasks) show at times considerable variation in reliability coefficients. The test–retest reliability of the event trials of the n-back tasks, for example, varies between .213 and .603. Fortunately the Monte-Carlo analysis helps us to make sense of these findings. Fig. 1 shows that as data aggregation increases, there are both an increase in the average reliability estimate, and a decrease in the variability in these estimates. Put another way, not only are measures of ISV with few trials unreliable, but estimates of their reliability may be themselves more variable. This suggests that it may be unwise to make comparisons between tasks used here. It further suggests that ISV measures with few trials should be treated with caution, even when reliability estimates seem promising.