User–user collaborative filtering, while effective, suffers from scalability
problems as the user base grows. Searching for the neighbors of
a user is an O(|U|) operation (or worse, depending on how similarities
are computing — directly computing most similarity functions
against all other users is linear in the total number of ratings). To
extend collaborative filtering to large user bases and facilitate deployment
on e-commerce sites, it was necessary to develop more scalable
96 Collaborative Filtering Methods
algorithms. Item–item collaborative filtering, also called item-based collaborative
filtering, takes a major step in this direction and is one of
the most widely deployed collaborative filtering techniques today.
Item–item collaborative filtering was first described in the literature
by Sarwar et al. [130] and Karypis [71], although a version of it seems
to have been used by Amazon.com at this time [87]. Rather than using
similarities between users’ rating behavior to predict preferences, item–
item CF uses similarities between the rating patterns of items. If two
items tend to have the same users like and dislike them, then they are
similar and users are expected to have similar preferences for similar
items. In its overall structure, therefore, this method is similar to earlier
content-based approaches to recommendation and personalization, but
item similarity is deduced from user preference patterns rather than
extracted from item data.
In its raw form, item–item CF does not fix anything: it is still necessary
to find the most similar items (again solving the k-NN problem) to
generate predictions and recommendations. In a system that has more
users than items, it allows the neighborhood-finding to be amongst the
smaller of the two dimensions, but this is a small gain. It provides
major performance gains by lending itself well to pre-computing the
similarity matrix.
As a user rates and re-rates items, their rating vector will change
along with their similarity to other users. Finding similar users in
advance is therefore complicated: a user’s neighborhood is determined
not only by their ratings but also by the ratings of other users, so their
neighborhood can change as a result of new ratings supplied by any
user in the system. For this reason, most user–user CF systems find
neighborhoods at the time when predictions or recommendations are
needed.
In systems with a sufficiently high user to item ratio, however, one
user adding or changing ratings is unlikely to significantly change the
similarity between two items, particularly when the items have many
ratings. Therefore, it is reasonable to pre-compute similarities between
items in an item–item similarity matrix. The rows of this matrix can
even be truncated to only store the k most similar items. As users
change ratings, this data will become slightly stale, but the users will
2.3 Item–Item Collaborative Filtering 97
likely still receive good recommendations and the data can be fully
updated by re-computing the similarities during a low-load time for
the system.
Item–item CF generates predictions by using the user’s own ratings
for other items combined with those items’ similarities to the target
item, rather than other users’ ratings and user similarities as in user–
user CF. Similar to user–user CF, the recommender system needs a
similarity function, this time s: I×I → R, and a method to generate
predictions from ratings and similarities.