Evaluating classifiers has long been an important topic of investigation [2, 15, 10, 3, 5]. Within information retrieval, classifiers have been useful for a variety of tasks including routing, web junk & spam identification, accelerated searching, and filtering. However, to deploy classification technology within a larger retrieval system, it is important to bound
the performance of the classifier with high confidence.
Furthermore, the underlying collection is changing and
this necessitates ongoing checks to determine that the
classifiers fall within tolerable performance ranges as determined