Approaches
Many different approaches have been applied to the basic problem of making accurate and efficient recommender and data mining
systems. Many of the technologies used in the actual recommender systems studied are fairly simple database queries. Automatic
recommender systems, however, use a wide range of techniques, ranging from nearest neighbor algorithms to Bayesian analysis. The
worst-case performance of many of these algorithms is known to be poor. However, many of the algorithms have been tuned to use
heuristics that are particularly efficient on the types of data that occur in practice.
The earliest recommenders used nearest-neighbor collaborative filtering algorithms (Resnick et al. 1994, Shardanand et al. 1995).
Nearest neighbor algorithms are based on computing the distance between consumers based on their preference history. Predictions
of how much a consumer will like a product are computed by taking the weighted average of the opinions of a set of nearest
neighbors for that product. Neighbors who have expressed no opinion on the product in question are ignored. Opinions should be
scaled to adjust for differences in ratings tendencies between users (Herlocker et al., 1999). Nearest neighbor algorithms have the
advantage of being able to rapidly incorporate the most up-to-date information, but the search for neighbors is slow in large
databases. Practical algorithms use heuristics to search for good neighbors and may use opportunistic sampling when faced with very
large populations.
Bayesian networks create a model based on a training set with a decision tree at each node and edges representing consumer
information. The model can be built off-line over a matter of hours or days. The resulting model is very small, very fast, and
essentially as accurate as nearest neighbor methods (Breese et al., 1998). Bayesian networks may prove practical for environments in
which knowledge of consumer preferences changes slowly with respect to the time needed to build the model but are not suitable for
environments in which consumer preference models must be updated rapidly or frequently.
Clustering techniques work by identifying groups of consumers who appear to have similar preferences. Once the clusters are
created, predictions for an individual can be made by averaging the opinions of the other consumers in that cluster. Some clustering
techniques represent each consumer with partial participation in several clusters. The prediction is then an average across the
clusters, weighted by degree of participation. Clustering techniques usually produce less-personal recommendations than other
methods, and in some cases, the clusters have worse accuracy than nearest neighbor algorithms (Breese et al., 1998). Once the
clustering is complete, however, performance can be very good, since the size of the group that must be analyzed is much smaller.
Clustering techniques can also be applied as a “first step” for shrinking the candidate set in a nearest neighbor algorithm or for
distributing nearest-neighbor computation across several recommender engines. While dividing the population into clusters may hurt
the accuracy or recommendations to users near the fringes of their assigned cluster, pre-clustering may be a worthwhile trade-off
Approaches
Many different approaches have been applied to the basic problem of making accurate and efficient recommender and data mining
systems. Many of the technologies used in the actual recommender systems studied are fairly simple database queries. Automatic
recommender systems, however, use a wide range of techniques, ranging from nearest neighbor algorithms to Bayesian analysis. The
worst-case performance of many of these algorithms is known to be poor. However, many of the algorithms have been tuned to use
heuristics that are particularly efficient on the types of data that occur in practice.
The earliest recommenders used nearest-neighbor collaborative filtering algorithms (Resnick et al. 1994, Shardanand et al. 1995).
Nearest neighbor algorithms are based on computing the distance between consumers based on their preference history. Predictions
of how much a consumer will like a product are computed by taking the weighted average of the opinions of a set of nearest
neighbors for that product. Neighbors who have expressed no opinion on the product in question are ignored. Opinions should be
scaled to adjust for differences in ratings tendencies between users (Herlocker et al., 1999). Nearest neighbor algorithms have the
advantage of being able to rapidly incorporate the most up-to-date information, but the search for neighbors is slow in large
databases. Practical algorithms use heuristics to search for good neighbors and may use opportunistic sampling when faced with very
large populations.
Bayesian networks create a model based on a training set with a decision tree at each node and edges representing consumer
information. The model can be built off-line over a matter of hours or days. The resulting model is very small, very fast, and
essentially as accurate as nearest neighbor methods (Breese et al., 1998). Bayesian networks may prove practical for environments in
which knowledge of consumer preferences changes slowly with respect to the time needed to build the model but are not suitable for
environments in which consumer preference models must be updated rapidly or frequently.
Clustering techniques work by identifying groups of consumers who appear to have similar preferences. Once the clusters are
created, predictions for an individual can be made by averaging the opinions of the other consumers in that cluster. Some clustering
techniques represent each consumer with partial participation in several clusters. The prediction is then an average across the
clusters, weighted by degree of participation. Clustering techniques usually produce less-personal recommendations than other
methods, and in some cases, the clusters have worse accuracy than nearest neighbor algorithms (Breese et al., 1998). Once the
clustering is complete, however, performance can be very good, since the size of the group that must be analyzed is much smaller.
Clustering techniques can also be applied as a “first step” for shrinking the candidate set in a nearest neighbor algorithm or for
distributing nearest-neighbor computation across several recommender engines. While dividing the population into clusters may hurt
the accuracy or recommendations to users near the fringes of their assigned cluster, pre-clustering may be a worthwhile trade-off
การแปล กรุณารอสักครู่..
