The second document has a higher score because it has a high weight for the first
term, which also has a high weight in the query. Even this simple example shows
that ranking based on the vector space model is able to reflect term importance
and the number of matching terms, which is not possible in Boolean retrieval.
In this discussion, we have yet to say anything about the form of the term
weighting used in the vector space model. Infact, many different weighting schemes
have been tried over the years. Most of these are variations on tf.idf weighting,
which was described briefly in Chapter 2. The term frequency component, tf, reflects
the importance of a term in a document Di (or query). This is usually computed
as a normalized count of the term occurrences in a document, for example
by