To obtain the population performance parameters so that statistical assumptions could be tested, stochastic simulations are generated from four TREC collections. Two cases are then considered when examining the robustness of test collection reliability measures with respect to statistical assumptions: (1) where an IR researcher has a test collection with a certain number of topics (sample) and wants to estimate its reliability and (2) where an IR researcher has access to a test collection with a certain number of sample topics and wants to estimate the reliability of a new collection with a different set of sample topics from the same population of topics as the initial collection.