Varying the Amount of Training Data
This section examines learning curves, which show the effect of gradually increasing
the amount of training data. Again, we use the glass data, but this time
with both IBk and the C4.5 decision tree learners, implemented in Weka as J48.
To obtain learning curves, use FilteredClassifier again, this time in conjunction
with weka.filters.unsupervised.instance.Resample, which extracts a certain
specified percentage of a given dataset and returns the reduced dataset.1 Again,
this is done only for the first batch to which the filter is applied, so the test
data passes unmodified through the FilteredClassifier before it reaches the
classifier.
Exercise 17.2.9. Record in Table 17.3 the data for learning curves for both the
one-nearest-neighbor classifier (i.e., IBk with k = 1) and J48.
Exercise 17.2.10. What is the effect of increasing the amount of training data?
Exercise 17.2.11. Is this effect more pronounced for IBk or J48?