Previous analysis suggested that a minimum of 50 topics should be used to obtain stable effectiveness estimates, and this number has been used for many years as a heuristic to guide test collection construction, even when the context of the test collection (including the type of search task being evaluated, and the range of effectiveness metrics being used) varied.