In order to achieve better prediction, most of the classification algorithms attempt to
learn the borderline of each class as exactly as possible in the training process. The
examples on the borderline and the ones nearby (we call them borderline examples in
this paper) are more apt to be misclassified than the ones far from the borderline, and
thus more important for classification.
Based on the analysis above, those examples far from the borderline may contribute
little to classification. We thus present two new minority over-sampling methods,
borderline-SMOTE1 and borderline-SMOTE2, in which only the borderline examples
of the minority class are over-sampled. Our methods are different from the existing
over-sampling methods in which all the minority examples or a random subset of the
minority class are over-sampled [1] [2] [12].
Our methods are based on SMOTE (Synthetic Minority Over-sampling Technique)
[12]. SMOTE generates synthetic minority examples to over-sample the minority
class. For every minority example, its k (which is set to 5 in SMOTE) nearest
neighbors of the same class are calculated, then some examples are randomly selected
from them according to the over-sampling rate. After that, new synthetic examples are
generated along the line between the minority example and its selected nearest
neighbors. Not like the existing over-sampling methods, our methods only oversample
or strengthen the borderline minority examples. First, we find out the border