C. Collisions
Collisions may be due to where the legitimate traffic
happens to contain the partial sensitive-data fingerprints by
coincidence. The collision may increase with shorter shingles,
or smaller numbers of partial fingerprints, and may decrease
if additional features such as the order of fingerprints are
used for detection. A previous large-scale information-retrieval
study empirically demonstrated the low rate of this type
of collisions in Rabin fingerprint [18], which is a desirable
property suggesting low unwanted false alarms in our
DLD setting. Collisions due to two distinct shingles generating
the same fingerprint are proved to be low [17] and are
negligible.