More on Automatic Attribute Selection
The Select attributes panel allows us to gain insight into a dataset by applying attribute
selection methods to it. However, as with supervised discretization, using this
information to reduce a dataset becomes problematic if some of the reduced data is
used for testing the model (as in cross-validation). Again, the reason is that we have
looked at the class labels in the test data while selecting attributes, and using the
test data to influence the construction of a model biases the accuracy estimates
obtained.
This can be avoided by dividing the data into training and test sets and applying
attribute selection to the training set only. However, it is usually more convenient
to use AttributeSelectedClassifer, one of Weka’s metalearners, which allows an
attribute selection method and a learning algorithm to be specified as part of a
classification scheme. AttributeSelectedClassifier ensures that the chosen set of
attributes is selected based on the training data only.
Now we test the three attribute selection methods from above in conjunction
with NaïveBayes. NaïveBayes assumes independence of attributes, so attribute
selection can be very helpful. You can see the effect of redundant attributes by
adding multiple copies of an attribute using the filter weka.filters.unsupervised.
attribute.Copy in the Preprocess panel. Each copy is obviously perfectly correlated
with the original.
Exercise 17.4.10. Load the diabetes classification data in diabetes.arff and add
copies of the first attribute. Measure the performance of NaïveBayes (with
useSupervisedDiscretization turned on) using cross-validation after you have
added each one. What do you observe?
Do the above three attribute selection methods, used in conjunction with AttributeSelectedClassifier
and NaïveBayes, successfully eliminate the redundant attributes?
Run each method from within AttributeSelectedClassifier to see the effect on
cross-validated accuracy and check the attribute subset selected by each method.
Note that you need to specify the number of ranked attributes to use for the Ranker
method. Set this to 8 because the original diabetes data contains 8 attributes (excluding
the class). Specify NaïveBayes as the classifier to be used inside the wrapper
method because this is the classifier for which we want to select a subset.
Exercise 17.4.11. What can you say regarding the performance of the three
attribute selection methods? Do they succeed in eliminating redundant copies?
If not, why?
More on Automatic Attribute Selection
The Select attributes panel allows us to gain insight into a dataset by applying attribute
selection methods to it. However, as with supervised discretization, using this
information to reduce a dataset becomes problematic if some of the reduced data is
used for testing the model (as in cross-validation). Again, the reason is that we have
looked at the class labels in the test data while selecting attributes, and using the
test data to influence the construction of a model biases the accuracy estimates
obtained.
This can be avoided by dividing the data into training and test sets and applying
attribute selection to the training set only. However, it is usually more convenient
to use AttributeSelectedClassifer, one of Weka’s metalearners, which allows an
attribute selection method and a learning algorithm to be specified as part of a
classification scheme. AttributeSelectedClassifier ensures that the chosen set of
attributes is selected based on the training data only.
Now we test the three attribute selection methods from above in conjunction
with NaïveBayes. NaïveBayes assumes independence of attributes, so attribute
selection can be very helpful. You can see the effect of redundant attributes by
adding multiple copies of an attribute using the filter weka.filters.unsupervised.
attribute.Copy in the Preprocess panel. Each copy is obviously perfectly correlated
with the original.
Exercise 17.4.10. Load the diabetes classification data in diabetes.arff and add
copies of the first attribute. Measure the performance of NaïveBayes (with
useSupervisedDiscretization turned on) using cross-validation after you have
added each one. What do you observe?
Do the above three attribute selection methods, used in conjunction with AttributeSelectedClassifier
and NaïveBayes, successfully eliminate the redundant attributes?
Run each method from within AttributeSelectedClassifier to see the effect on
cross-validated accuracy and check the attribute subset selected by each method.
Note that you need to specify the number of ranked attributes to use for the Ranker
method. Set this to 8 because the original diabetes data contains 8 attributes (excluding
the class). Specify NaïveBayes as the classifier to be used inside the wrapper
method because this is the classifier for which we want to select a subset.
Exercise 17.4.11. What can you say regarding the performance of the three
attribute selection methods? Do they succeed in eliminating redundant copies?
If not, why?
การแปล กรุณารอสักครู่..