API sets. For different versions, the sets will be similar but not
identical. We run this analysis on components extracted from
all applications and then use the Jaccard distance to compute
dissimilarity between API sets. If it is below a certain threshold
(we used 0.2), we place the components in the same cluster.
Thus packages of different ad libraries end up in different
clusters, and then clusters can be easily mapped to ad libraries.