Visualizing Nearest-Neighbor Learning
Now let’s examine the classification boundaries created by the nearest-neighbor
method. Use the boundary visualizer’s Choose button to select the IBk classifier
(weka.classifiers.lazy.IBk) and plot its decision boundaries for the reduced iris
data.
OneR’s predictions are categorical: For each instance, they predict one of the
three classes. In contrast, IBk outputs probability estimates for each class, and the
boundary visualizer uses them to mix the colors red, green, and blue that correspond
to the three classes. IBk estimates class probabilities by looking at the set of k-nearest
neighbors of a test instance and counting the number in each class.
Exercise 17.3.5. With k = 1, which is the default value, it seems that the set of
k-nearest neighbors could have only one member and therefore the color will
always be pure red, green, or blue. Looking at the plot, this is indeed almost
always the case: There is no mixing of colors because one class gets a probability
of 1 and the others a probability of 0. Nevertheless, there is a small
area in the plot where two colors are in fact mixed. Explain this. (Hint:
Examine the data carefully using the Explorer interface’s Visualize panel.)
Exercise 17.3.6. Experiment with different values of k, say 5 and 10. Describe
what happens as k increases.