a good proxy for measuring performance, since it faithfully
represents the core functionality of the evaluated system.
7.1.3 Retrieval Methods
In this paper we presented two possible ways of integrating
topic retrieval into the related video suggestion system.
First, in Section 3 we discussed a retrieval algorithm that
assigns weights to topics using co-occurence based heuristics.
Second, in Section 4 we presented a novel algorithm
for directly learning weights on topic transitions.
We evaluate both of these methods by integrating them
into the general related video suggestion system architecture
as described in Figure 4. First, the highest ranked results
produced by one of the two proposed retrieval methods are
introduced into the reranking model. Then we measure the
changes in the overall system performance, using either a
simulation experiment (Section 7.2) or an experiment using
live traffic (Sectrion 7.3).
In the next sections, we refer to the retrieval algorithm
presented in Section 3 as IRTopics, since it makes use of
information retrieval heuristics. We refer to the retrieval
algorithm from Section 4 as TransTopics, since it is based
on learning transitions between the topics.
7.2 User Simulation
In this section we describe a user simulation method for
estimating the performance of our retrieval methods. We exploit
the reranking model described in Section 5 to simulate
user interaction with the system.
Since the reranking model is trained to optimize the system
performance (in terms of click-through rate and watch
time) on live traffic, we use it to simulate a behavior of a typical
user in the system. Then we measure how many of the
results returned by the topic retrieval method will be added
by the simulated user to the top related results, compared
to a system that only uses co-view retrieval. By system design,
if no results from the topic retrieval are selected by
the simulated user, there is no benefit from performing this
retrieval, since none of the results will be shown to the real
users.
There are two things we are interested in measuring. First,
we measure how many new results our method introduces
that were not previously returned by the co-view retrieval
approach. Second, and more importantly, we want to observe
how many of these results are actually considered as
relevant related videos by the simulated user (i.e., positioned
at high ranks by the reranking model).
We run the user simulation for a large sample of videos.
Figure 5(a) shows the percentage of new videos ranked among
the top-K results by the simulated user that were intro-