Recommender systems do not exist in a vacuum. Treating recommendation
abstractly as mathematical problem, aiming primarily to improve
offline evaluations with prediction accuracy metrics such as RMSE,
ignores the broader context of use and is not necessarily measuring
the impact these systems have on their users. Re-grounding recommender
systems in user needs can have a profound impact on how
we approach the field. Anchoring both the design of the recommender
system itself and the evaluation strategy used to measure its effectiveness
to a detailed understanding of user goals, tasks, and context can
enable us to build systems that better serve their users [74, 99].
Users use a recommender system for some purpose. With systems
like GroupLens and PHOAKS, that purpose can be to more easily filter
through high volumes of articles and find interesting new resources.Movie
recommenders can help users find new movies and choose films to watch.
A joke recommender can provide entertainment. In each case, we can consider
the recommendations to have some utility to the user. User needs can
also extend beyond the recommendation list — some users interact with
recommender systems for the purpose of self-expression, with the rating
process rather than the resulting recommendations being their desired
end [56, 61]. Evaluation metrics are useful, therefore, to the extent that
138
5.1 User Tasks 139
they map to (or at least correlate with) the user’s utility derived from
the system’s output (recommendations and predictions) and the overall
experience it provides. Task- and need-driven user-based studies are
needed to determine what factors actually do affect the system’s ability
to meet user needs and improve the user experience.
5.1 User Tasks
The classical recommender tasks of predict and recommend can be
re-cast in terms of user needs as “estimate how much I will like an
item” and “find items I will like”, respectively [61]. User needs can also
take more nuanced manifestations. For example, users can use a recommender
system to find new items the may like (introduction) or to recall
previous items they enjoyed (reuse); systems like the Pandora music
recommender are built to meet these two needs in balance (users want
to discover new music while also listening to music they know they like).
Users can also be interested in merely exploring the item space (explore,
the “Just Browsing” task of Herlocker et al. [61]) rather than supporting
a particular decision (make decision). Some users have the goal of
determining the recommender’s credibility (evaluate recommender ) —
they may wish to see how the recommender does at estimating their
preference for items they know well in order to determine how much
they trust its unfamiliar recommendations (or how much time they
want to invest in training the system). By supporting this last task, a
balance of familiar and unfamiliar items can be important in a developing
long-term relationships between users and recommenders [99].
The type of interaction the user has with the recommender also
impacts how it should perform. Does the user treat the recommender
as an information source to be searched, or as a decision support tool?
Even when using a recommender for decision support, users still have
differing goals. In some cases they may be primarily interested in exploring
the space of options. They may turn to the recommender to provide
them with a candidate set from which they will choose. In other
cases, they may want the recommender to actually make the selection.
A movie recommender or an e-commerce site is likely to be used
to explore or determine a candidate set; the user generally makes the
140 User Information Needs
final decision on which movie to see. Real-time music recommenders
such as Pandora, however, select a song for the user to hear next, providing
the user with a means for critiquing the decision but not often
do not allow the user directly select from a set of items.
Even recommenders with equivalent numerical performance can
have qualitative differences in their result lists [146]. McNee et al. [99]
call these personalities, and choosing an algorithm whose personality
matches the user’s needs can provide greater user satisfaction.
5.2 Needs for Individual Items
Thinking about prediction from the perspective of user perception
has led to the adoption of some common evaluation metrics: RMSE’s
increased penalty for high error and other variants on MAE designed
to distinguish high error both stem from the assumption that users are
likely to forgive a small error in preference estimation (such as mispredicting
by half a star), while gross errors (predicting a 3-star movie
as a 5-star) will more significantly hamper user experience. It is also
frequently more important to be accurate at the high end of the scale:
if the system correctly predicts that the user will dislike an item, does
it matter how mu