To evaluate the classifier we use a pre-quential evaluation
method, where we test an arriving instance first, and if we
decide to pay the cost for its label then we use it to update
the current model. In addition, we capture snapshots of the
performance at every 100 instances and not only the accuracy
but also the kappa statistic and the confusion matrix that we
use to asses the effect of the class imbalance on the minority
class.