Most classification tasks are evaluated using standard information retrieval metrics,
such as accuracy,6 precision, recall, the F measure, and ROC curve analysis.
Each of these metrics were described in detail in Chapter 8. Of these metrics, the
most commonly used are accuracy and the F measure.
There are two major differences between evaluating classification tasks and
other retrieval tasks. The first difference is that the notion of “relevant” is replaced
with “is classified correctly.” The other major difference is that microaveraging,
which is not commonly used to evaluate retrieval tasks, is widely used in classification
evaluations. Macroaveraging for classification tasks involves computing
some metric for each class and then computing the average of the per-class metrics.
On the other hand, microaveraging computes a metric for every test instance
(document) and then averages over all such instances. It is often valuable to compute
and analyze both the microaverage and the macroaverage, especially when
the class distribution P(c) is highly skewed.