Given a query, an engine classifies each doc as “Relevant” or “Nonrelevant”
The accuracy of an engine: the fraction of these classifications that are correct
(tp + tn) / ( tp + fp + fn + tn)
Accuracy is a commonly used evaluation measure in machine learning classification work
Why is this not a very useful evaluation measure in IR?