Abstract
A fundamental task in data analysis is understanding the dierences between several contrasting
groups. These groups can represent dierent classes of ob jects, such as male or female
students, or the same group over time, e.g. freshman students in 1993 through 1998. We
present the problem of mining contrast sets: conjunctions of attributes and values that dier
meaningfully in their distribution across groups. We provide a search algorithm for mining
contrast sets with pruning rules that drastically reduce the computational complexity. Once
the contrast sets are found, we post-process the results to present a subset that are surprising
to the user given what we have already shown. We explicitly control the probability of Type
I error (false positives) and guarantee a maximum error rate for the entire analysis by using
Bonferroni corrections