There are two main results in this paper. First, we con-
firmed that time until task completion has a negative correlation
with user satisfaction on all levels. Secondly, we have
demonstrated that time until task completion can be used
as a metric to differentiate ranking algorithms of moderately
different quality in a reasonably sized experiment. However,
because there is substantial variation in different user’s task
completion times for the same tasks, using a cross-over design
provides considerable gains in efficiency