The key idea of ADASYN algorithm is to use a density
distribution ˆ ri as a criterion to automatically decide the
number of synthetic samples that need to be generated for
each minority data example. Physically, ˆ ri is a measurement
of the distribution of weights for different minority class
examples according to their level of difficulty in learning.
The resulting dataset post ADASYN will not only provide a
balanced representation of the data distribution (according to
the desired balance level defined by the β coefficient), but it
will also force the learning algorithm to focus on those difficult
to learn examples. This is a major difference compared to the
SMOTE [15] algorithm, in which equal numbers of synthetic
samples are generated for each minority data example. Our
objective here is similar to those in SMOTEBoost [16] and
DataBoost-IM [17] algorithms: providing different weights for
different minority examples to compensate for the skewed
distributions. However, the approach used in ADASYN is
more efficient since both SMOTEBoost and DataBoost-IM
rely on the evaluation of hypothesis performance to update
the distribution function, whereas our algorithm adaptively
updates the distribution based on the data distribution characteristics.
Hence, there is no hypothesis evaluation required
for generating synthetic data samples in our algorithm