Where did this optimization come from? The impatient reader will have to
jump ahead to the explanation for a general SVM classifier in Chapter 9. For the
time being, we can say that the SVM algorithm will find a classifier (i.e., the vector
⃗w) that has the following property. Each pair of documents in our training data
can be represented by the vector (⃗di− ⃗dj ). If we compute the score for this pair as
⃗w.(⃗di− ⃗dj), the SVM classifier will find a ⃗w that makes the smallest score as large
as possible. The same thing is true for negative examples (pairs of documents that
are not in the rank data). This means that the classifier will make the differences
in scores as large as possible for the pairs of documents that are hardest to rank.
Note that this model does not specify the features that should be used. It could
even be used to learn the weights for features corresponding to scores from completely
different retrieval models, such as BM25 and language models. Combining
multiple searches for a given query has been shown to be effective in a number
of experiments, and is discussed further in section 10.5.1. It should also be noted
that the weights learned by Ranking SVM (or some other discriminative technique)
can be used directly in the inference network query language.