The data for the plot in Figure 3 was extracted from a Mammography dataset1 (Woods
et al., 1993). The minority class samples are shown by + and the majority class samples
are shown by o in the plot. In Figure 3(a), the region indicated by the solid-line rectangle
is a majority class decision region. Nevertheless, it contains three minority class samples
shown by ’+’ as false negatives. If we replicate the minority class, the decision region for the
minority class becomes very specific and will cause new splits in the decision tree. This will
lead to more terminal nodes (leaves)as the learning algorithm tries to learn more and more
specific regions of the minority class; in essence, overfitting. Replication of the minority
class does not cause its decision boundary to spread into the majority class region. Thus,
in Figure 3(b), the three samples previously in the majority class decision region now have
very specific decision regions.