Interactive Decision Tree Construction
One of Weka’s classifiers is interactive: It lets the user—you!—construct your own
classifier. Here’s a competition: Who can build a classifier with the highest predictive
accuracy?
Follow the procedure described in Section 11.2 (page 424). Load the file segmentchallenge.
arff (in the data folder that comes with the Weka distribution). This dataset
has 20 attributes and 7 classes. It is an image segmentation problem, and the task is
to classify images into seven different groups based on properties of the pixels.
Set the classifier to UserClassifier, in the weka.classifiers.trees package. We use
a separate test set (performing cross-validation with UserClassifier is incredibly
tedious!), so in the Test options box choose the Supplied test set option and click
the Set button. A small window appears in which you choose the test set. Click Open
file and browse to the file segment-test.arff (also in the Weka distribution’s data
folder). On clicking Open, the small window updates to show the number of attributes
(20) in the data. The number of instances is not displayed because test instances
are read incrementally (so that the Explorer interface can process larger test files
than can be accommodated in main memory).
Click Start. UserClassifier differs from all other classifiers: It opens a special
window and waits for you to build your own classifier in it. The tabs at the top of
the window switch between two views of the classifier. The Tree visualizer shows
the current state of your tree, and the nodes give the number of class values there.
The aim is to come up with a tree of which the leaf nodes are as pure as possible.
To begin with, the tree has just one node—the root node—containing all the data.
More nodes will appear when you proceed to split the data in the Data visualizer.
Click the Data visualizer tab to see a two-dimensional plot in which the data
points are color-coded by class, with the same facilities as the Visualize panel
discussed in Section 17.1. Try different combinations of x- and y-axes to get the
clearest separation you can find between the colors. Having found a good separation,
you then need to select a region in the plot: This will create a branch in
the tree. Here’s a hint to get you started: Plot region-centroid-row on the x-axis
and intensity-mean on the y-axis (the display is shown in Figure 11.14(a)); you
can see that the red class (sky) is nicely separated from the rest of the classes
at the top of the plot.
There are four tools for selecting regions in the graph, chosen using the dropdown
menu below the y-axis selector. Select Instance identifies a particular instance. Rectangle
(shown in Figure 11.14(a)) allows you to drag out a rectangle on the graph.
With Polygon and Polyline you build a free-form polygon or draw a free-form
polyline (left-click to add a vertex and right-click to complete the operation).
When you have selected an area using any of these tools, it turns gray. (In Figure
11.14(a) the user has defined a rectangle.) Clicking the Clear button cancels the
selection without affecting the classifier. When you are happy with the selection,
click Submit. This creates two new nodes in the tree, one holding all the instances
covered by the selection and the other holding all remaining instances. These nodes
correspond to a binary split that performs the chosen geometric test.
Switch back to the Tree visualizer view to examine the change in the tree.
Clicking on different nodes alters the subset of data that is shown in the Data
visualizer section. Continue adding nodes until you obtain a good separation of
the classes—that is, the leaf nodes in the tree are mostly pure. Remember, however,
that you should not overfit the data because your tree will be evaluated
Interactive Decision Tree Construction
One of Weka’s classifiers is interactive: It lets the user—you!—construct your own
classifier. Here’s a competition: Who can build a classifier with the highest predictive
accuracy?
Follow the procedure described in Section 11.2 (page 424). Load the file segmentchallenge.
arff (in the data folder that comes with the Weka distribution). This dataset
has 20 attributes and 7 classes. It is an image segmentation problem, and the task is
to classify images into seven different groups based on properties of the pixels.
Set the classifier to UserClassifier, in the weka.classifiers.trees package. We use
a separate test set (performing cross-validation with UserClassifier is incredibly
tedious!), so in the Test options box choose the Supplied test set option and click
the Set button. A small window appears in which you choose the test set. Click Open
file and browse to the file segment-test.arff (also in the Weka distribution’s data
folder). On clicking Open, the small window updates to show the number of attributes
(20) in the data. The number of instances is not displayed because test instances
are read incrementally (so that the Explorer interface can process larger test files
than can be accommodated in main memory).
Click Start. UserClassifier differs from all other classifiers: It opens a special
window and waits for you to build your own classifier in it. The tabs at the top of
the window switch between two views of the classifier. The Tree visualizer shows
the current state of your tree, and the nodes give the number of class values there.
The aim is to come up with a tree of which the leaf nodes are as pure as possible.
To begin with, the tree has just one node—the root node—containing all the data.
More nodes will appear when you proceed to split the data in the Data visualizer.
Click the Data visualizer tab to see a two-dimensional plot in which the data
points are color-coded by class, with the same facilities as the Visualize panel
discussed in Section 17.1. Try different combinations of x- and y-axes to get the
clearest separation you can find between the colors. Having found a good separation,
you then need to select a region in the plot: This will create a branch in
the tree. Here’s a hint to get you started: Plot region-centroid-row on the x-axis
and intensity-mean on the y-axis (the display is shown in Figure 11.14(a)); you
can see that the red class (sky) is nicely separated from the rest of the classes
at the top of the plot.
There are four tools for selecting regions in the graph, chosen using the dropdown
menu below the y-axis selector. Select Instance identifies a particular instance. Rectangle
(shown in Figure 11.14(a)) allows you to drag out a rectangle on the graph.
With Polygon and Polyline you build a free-form polygon or draw a free-form
polyline (left-click to add a vertex and right-click to complete the operation).
When you have selected an area using any of these tools, it turns gray. (In Figure
11.14(a) the user has defined a rectangle.) Clicking the Clear button cancels the
selection without affecting the classifier. When you are happy with the selection,
click Submit. This creates two new nodes in the tree, one holding all the instances
covered by the selection and the other holding all remaining instances. These nodes
correspond to a binary split that performs the chosen geometric test.
Switch back to the Tree visualizer view to examine the change in the tree.
Clicking on different nodes alters the subset of data that is shown in the Data
visualizer section. Continue adding nodes until you obtain a good separation of
the classes—that is, the leaf nodes in the tree are mostly pure. Remember, however,
that you should not overfit the data because your tree will be evaluated
การแปล กรุณารอสักครู่..