Triplet Sampling
To avoid overfitting, it is desirable to utilize a large variety of images. However, the number of possible triplets
increases cubically with the number of images. It is computationally prohibitive and sub-optimal to use all the triplets.
For example, the training dataset in this paper contains 12
million images. The number of all possible triplets in this
dataset is approximately (1.2× 107)3 = 1.728× 1021. This
is an extermely large number that can not be enumerated.
If the proposed triplet sampling algorithm is employed, we
find the optimization converges with about 24 million triplet
samples, which is a lot smaller than the number of possible
triplets in our dataset.
It is crucial to choose an effective triplet sampling strategy to select the most important triplets for rank learning.
Uniformly sampling of the triplets is sub-optimal, because
we are more interested in the top-ranked results returned by
the ranking model. In this paper, we employ an online importance sampling scheme to sample triplets.
Suppose we have a set of images P, and their pairwise
relevance scores r
i,j = r(pi, pj). Each image pi belongs to
a category, denoted by ci. Let the total relevance score of
an image ri defined as
ri = X
j:cj=ci,j6=i
ri,j (6)
The total relevance score of an image pi reflects how relevant the image is in terms of its relevance to the other images in the same category.
To sample a triplet, we first sample a query image pi
from P according to its total relevance score. The probability of an image being chosen as query image is proportional
to its total relevance score.