Besides, we assume that both the documents in the
index databases and the user queries are represented as
vectors.
For each document vector corresponding to a document,
the weight of
a term in the vector is defined as the raw
term frequency of the term divided by the maximum
raw term frequency among all the terms in the document. The similarity between any two vectors is the
cosine value between them, which is the two vectors’
dot product divided by their two corresponding norms.
If
x = (21, ..., x,) is a vector, the norm of x is usually
defined as
(x: + ... + xL) 3. When we cluster the documents, the similarity between two document vectors
is computed. Similarly, when we answer a user query,
the similarity between the query vector and
a document vector representing the document to be retrieved
is computed.