We will experiment with several
different sampling strategies including the natural distribution
(close to 5_95, highly skewed), 20_80 (unbalanced) and 50_50
(the balanced). This means we trained our models using
training set where the target was present at its native
prevalence, as well as training sets where target prevalence was
enriched by a large multiplier to ten times and twenty-five
times its native prevalence or simply balancing the classes.
This hopefully helps us to understand to certain extent how
enrichment will affect performance. In cases of resampling, the
negative samples are uniformly drawn from the other 19
classes in the original database.