We consider experiments to measure the quality of a web
search algorithm based on how much total time users take
to complete assigned search tasks using that algorithm. We
first analyze our data to verify that there is in fact a negative
relationship between a user’s total search time and a
user’s satisfaction for the types of tasks under consideration.
Secondly, we fit a model with the user’s total search time as
the response to compare two different search algorithms. Finally,
we propose an alternative experimental design which
we demonstrate to be a substantial improvement over our
current design in terms of variance reduction and efficiency.
Categories and Subject Descriptors: H.1 [Information
Systems]: Models and principles; H.3 [Information Systems]:
Information storage and retrieval; G.3 [Mathematics of Computing]:
Probability and Statistics
General Terms: Design, Experimentation, Measurement
Keywords: Evaluation metrics, Experiment design, Interactive
IR and visualization, Question answering