17.2 NEAREST-NEIGHBOR LEARNING
AND DECISION TREES
In this section you will experiment with nearest-neighbor classification and decision
tree learning. For most of it, a real-world forensic glass classification dataset
is used.
We begin by taking a preliminary look at the dataset. Then we examine the effect
of selecting different attributes for nearest-neighbor classification. Next we study
class noise and its impact on predictive performance for the nearest-neighbor method.
Following that we vary the training set size, both for nearest-neighbor classification
and for decision tree learning. Finally, you are asked to interactively construct a
decision tree for an image segmentation dataset.
Before continuing you should review in your mind some aspects of the classification
task:
• How is the accuracy of a classifier measured?
• To make a good classifier, are all the attributes necessary?
• What is class noise, and how would you measure its effect on learning?
• What is a learning curve?
• If you, personally, had to invent a decision tree classifier for a particular
dataset, how would you go about it?