Here we demonstrate how to efficiently bound the precision of a classier using minimal amounts of labeled data by
adapting the techniques of stratified sampling [13, 9] to the
problem of classier evaluation. In particular, we demonstrate that the output score of the classier serves as a good
basis for stratification by identifying regions of similar classier behavior because of the typical monotonic relationship
between classifiers and the true class-conditional posterior
[2]. Given a stratification into regions or strata, there is
an optimal strategy [13, 9] for distributing the number of
samples across these strata. However, this optimal strategy
relies on knowing the variance of the classifier within each
stratum which would preclude the need for evaluation