2.4 Cluster Analysis
The main problem for scaling a CF classifier is the amount of operations involved in
computing distances – for finding the best k-nearest neighbors, for instance. A possi-
ble solution is, as we saw in section 2.2.3, to reduce dimensionality. But, even if we
reduce dimensionality of features, we might still have many objects to compute the
distance to. This is where clustering algorithms can come into play. The same is true
for content-based RS, where distances among objects are needed to retrieve simi-
lar ones. Clustering is sure to improve efficiency because the number of operations
is reduced. However, and unlike dimensionality reduction methods, it is unlikely
that it can help improve accuracy. Therefore, clustering must be applied with care
when designing a RS, measuring the compromise between improved efficiency and
a possible decrease in accuracy.