of dimensions. In the end, we pick up the N most similar
items of this list to build a group R of recommended items.
After that, our testing set is composed by the future views
of this same user group, that it will be used to check the
correctness of our method, comparing this group with the
recommended items R.
This comparison is made using diverse metrics, and in
the next section, we present the results of some of them.
The metrics exposed in Section VI-C are a variation of
Precision, the Normalized Discounted Cumulative Gain, or
nDCG [12], and the Rank-Score [11] (it extends the recall
metric to take the positions of correct items in a ranked list
into account).
We use a variation of Precision as follows: in the case
of the watched video from testing set of an user is in the
recommended items set R, the precision value is 100%,
otherwise, the result is 0%. This change was made because
our database has all video views from Sambatech Platform,
but our recommendation item is not showed to user indeed.
So, we must infer if the user watched or not an object that
would be recommended using our technique, which is made
by comparing the user watched videos and the recommended
items set. However, a precision of 0% does not means that
the technique failed, once the object was not offered to
user, it is only used this value to discriminate the results.
Furthermore, in the online videos scenario, watch only one
video from a small group of recommended items can be
treated as a success. In other words, the user does not have
to watch all recommended items to validate the success of
technique.
An example of this precision case is the Youtube: after
finish a video, there are lots of videos recommended. In the
case of one hit, it may be concluded that the recommendation
was successful.
C. Results
Because of our large database, we must at first split the
training and testing sets, which was made as presented in
Table III.
Training set Testing set
07/01/2012 07/02/2012 a 07/07/2012
Table III
DATASET SPLIT
We must emphasize that the one week result was chosen
because it is a representative portion or our database, and it
covers a large amount of data (millions user session), which
is a consistent empirical validation for recommendation
systems.
The group U of users of the training set must be the same
used in testing set, justifying the data splitting presented
in Table III. The testing set is larger than the training set,
since we have to use all users from training set to generate