In Section 2, we review the notion of calibration of probability
estimates and show that although the scores produced by naive
Bayes and support vector machine (SVM) classifiers tend to rank
examples well, they are not well-calibrated. In Section 3, we review
previous methods for mapping two-class scores into probability estimates,
explain their shortcomings and present our new method.
In Section 4 we discuss how to combine calibrated two-class probability
estimates into calibrated multiclass probability estimates. In
Section 5 we present an experimental evaluation of these methods
applied to naive Bayes and SVM scores in a variety of domains.
Finally, in Section 6 we summarize the contributions of this paper
and suggest directions for future work.