Given this representation, documents could be ranked by computing the distance
between the points representing the documents and the query. More commonly,
a similarity measure is used (rather than a distance or dissimilarity measure),
so that the documents with the highest scores are the most similar to the
query. A number of similarity measures have been proposed and tested for this
purpose. The most successful of these is the cosine correlation similarity measure.
The cosine correlation measures the cosine of the angle between the query and
the document vectors. When the vectors are normalized so that all documents
and queries are represented by vectors of equal length, the cosine of the angle between
two identical vectors will be 1 (the angle is zero), and for two vectors that
do not share any non-zero terms, the cosine will be 0. The cosine measure is defined
as: