combines the advantages of the word-based similarity measures with the efficiency of fingerprints based on hashing.
It has the unusual property for a hashing function that similar documents have similar hash values.
More precisely,
the similarity of two pages as measured by the cosine correlation measure is proportional to the number of bits that are the same in the fingerprints generated by simhash