algorithm as a pre-processing step to help in neighborhood formation. They do not restrict the neighborhood to the cluster the user belongs to but
rather use the distance from the user to different cluster centroids as a pre-selection
step for the neighbors. They also implement a cluster-based smoothing technique in
which missing values for users in a cluster are replaced by cluster representatives.
Their method is reported to perform slightly better than standard kNN-based CF. In
a similar way, Sarwar et al. [26] describe an approach to implement a scalable kNN
classifier. They partition the user space by applying the bisecting k-means algorithm
and then use those clusters as the base for neighborhood formation. They report a
decrease in accuracy of around 5% as compared to standard kNN CF. However, their
approach allows for a significant improvement in efficiency.
Connor and Herlocker [21] present a different approach in which, instead of
users, they cluster items. Using the Pearson Correlation similarity measure they try
out four different algorithms: average link hierarchical agglomerative [39], robust
clustering algorithm for categorical attributes (ROCK) [40], kMetis, and hMetis 3.
Although clustering did improve efficiency, all of their clustering techniques yielded
worse accuracy and coverage than the non-partitioned baseline. Finally, Li et al.[60]
and Ungar and Foster [72] present a very similar approach for using k-means clustering for solving a probabilistic model interpretation of the recommender problem.