Impact on absolute performance measure can be significant (0.32 vs 0.39)
Little impact on ranking of different systems or relative performance
Suppose we want to know if algorithm A is better than algorithm B
A standard information retrieval experiment will give us a reliable answer to this question.