Recommendation Systems have long been serving the industry of e-commerce with
recommendations pertaining to movies, books, travel packages et cetera. A user‟s activity or
past history of purchases is used to generate predictions for that user. Youtube‟s video
recommendation system, Amazon‟s “You may also like…” and Pandora‟s music
recommendation system are a few very popular examples. Both explicit and implicit feedbacks
are being utilized to churn out predictions about the likings of a customer to recommend items.
As recommendation systems have evolved, we primarily encounter two types- Content based
and Collaborative Filtering based recommendation systems.
Content based recommendation systems are designed to recommend items similar to the one a
user has liked in the past. Recommendation systems based on collaborative filtering
recommend items liked by similar users. Users who have liked similar items are identified and
items highly liked by those users are recommended. For both content based and collaborative
filtering based recommendation systems to predict a rating, it is essential to establish a similarity
between items. We have explored correlation and clustering to establish similarity. It was
observed that correlation captured similarity better than done by clustering alone. With an
intuition that clustering items into similar groups and then employing correlation to determine
similarities could improve predictions, we developed an algorithm which is a combination of
clustering and correlation that eventually generates prediction for an item rating. We have
experimented with adding contextual information to generate better predictions. Our results
suggest that predictions generated by using clustering alone got improved by substituting it with
correlation. Further, it was seen that a combination of both improved the predictions over
clustering alone but correlation still delivered the best results overall. It was established that
bringing in more information may not always help. In this thesis we compare these three
algorithms and present our analysis with results.