Theform of query term weighting is essentially the same. Adding 1 to the termfrequency
component ensures that terms with frequency 1 have a non-zero weight.
Note that, in this model, term weights are computed only for terms that occur in
the document (or query). Given that the cosine measure normalization is incorporated
into the weights, the score for a document is computed using simply the
dot product of the document and query vectors.
Although there is no explicit definition of relevance in the vector space model,
there is an implicit assumption that relevance is related to the similarity of query
and document vectors. In other words, documents “closer” to the query are more
likely to be relevant. This is primarily a model of topical relevance, although features
related to user relevance could be incorporated into the vector representation.
No assumption is made about whether relevance is binary or multivalued.
In the last chapter we described relevance feedback, a technique for query
modification based on user-identified relevant documents. This technique was
first introduced using the vector space model. The well-known Rocchio algorithm
(Rocchio, 1971) was based on the concept of an optimal query, which maximizes
the difference between the average vector representing the relevant documents
and the average vector representing the non-relevant documents. Given that only
limited relevance information is typically available, the most common (and effective)
form of the Rocchio algorithm modifies the initial weights in query vector
Q to produce a new query Q′
according to