Class Noise and Nearest-Neighbor Learning
Nearest-neighbor learning, like other techniques, is sensitive to noise in the training
data. In this section we inject varying amounts of class noise into the data and
observe the effect on classification performance.
You can flip a certain percentage of class labels in the data to a randomly
chosen other value using an unsupervised attribute filter called AddNoise, in weka.
filters.unsupervised.attribute. However, for this experiment it is important that the
test data remains unaffected by class noise. Filtering the training data without
filtering the test data is a common requirement, and is achieved using a metalearner
called FilteredClassifier, in weka.classifiers.meta, as described near the end
of Section 11.3 (page 444). This metalearner should be configured to use IBk as
the classifier and AddNoise as the filter. FilteredClassifier applies the filter to the
data before running the learning algorithm. This is done in two batches: first the
training data and then the test data. The AddNoise filter only adds noise to the
first batch of data it encounters, which means that the test data passes through
unchanged.
Exercise 17.2.6. Record in Table 17.2 the cross-validated accuracy estimate
of IBk for 10 different percentages of class noise and neighborhood sizes
k = 1, k = 3, k = 5 (determined by the value of k in the k-nearest-neighbor
classifier).
Exercise 17.2.7. What is the effect of increasing the amount of class noise?
Exercise 17.2.8. What is the effect of altering the value of k?