whereidfk is the inverse document frequency weight for term k, N is the number
of documents in the collection, and nk is the number of documents in which term
k occurs. The form of this weight was developed by intuition and experiment,
although an argument can be made that idf measures the amount of information
carried by the term, as defined in information theory (Robertson, 2004).
The effects of these two weights are combined by multiplying them (hence
the name tf.idf). The reason for combining them this way is, once again, mostly
empirical. Given this, the typical form of document term weighting in the vector
space model is: