Weka’s metalearner CVParameterSelection searches for the best parameter settings
by optimizing cross-validated accuracy on the training data. By default, each
setting is evaluated using tenfold cross-validation. The parameters to optimize are
specified using the CVParameters field in the Generic Object Editor window. For
each parameter, three pieces of information must be supplied: (1) a string that
names it using its letter code (which can be found in the Javadoc for the corresponding
classifier—see Section 14.2, page 525); (2) a numeric range of values
to evaluate; and (3) the number of steps to try in this range (note that the parameter
is assumed to be numeric). Click on the More button in the Generic Object
Editor window for more information and an example.
For the diabetes data used in the previous section, use CVParameterSelection
in conjunction with IBk to select the best value for the neighborhood size, ranging
from 1 to 10 in 10 steps. The letter code for the neighborhood size is K. The
cross-validated accuracy of the parameter-tuned version of IBk is directly comparable
with its accuracy using default settings because tuning is performed by
applying inner cross-validation runs to find the best parameter value for each
training set occurring in the outer cross-validation—and the latter yields the final
performance estimate.
Exercise 17.4.12. What accuracy is obtained in each case? What value is
selected for the parameter-tuned version based on cross-validation on the full
data set? (Note: This value is output in the Classifier Output text area because,
as mentioned earlier, the model that is output is the one built from the full
dataset.)
Now consider parameter tuning for J48. If there is more than one parameter string
in the CVParameters field, CVParameterSelection performs a grid search on the
parameters simultaneously. The letter code for the pruning confidence parameter is
C, and you should evaluate values from 0.1 to 0.5 in five steps. The letter code for
the minimum leaf size parameter is M, and you should evaluate values from 1 to 10
in 10 steps.
Exercise 17.4.13. Run CVParameterSelection to find the best parameter value
setting. Compare the output you get to that obtained from J48 with default
parameters. Has accuracy changed? What about tree size? What parameter
values were selected by CVParameterSelection for the model built from the
full training set?
Weka’s metalearner CVParameterSelection searches for the best parameter settings
by optimizing cross-validated accuracy on the training data. By default, each
setting is evaluated using tenfold cross-validation. The parameters to optimize are
specified using the CVParameters field in the Generic Object Editor window. For
each parameter, three pieces of information must be supplied: (1) a string that
names it using its letter code (which can be found in the Javadoc for the corresponding
classifier—see Section 14.2, page 525); (2) a numeric range of values
to evaluate; and (3) the number of steps to try in this range (note that the parameter
is assumed to be numeric). Click on the More button in the Generic Object
Editor window for more information and an example.
For the diabetes data used in the previous section, use CVParameterSelection
in conjunction with IBk to select the best value for the neighborhood size, ranging
from 1 to 10 in 10 steps. The letter code for the neighborhood size is K. The
cross-validated accuracy of the parameter-tuned version of IBk is directly comparable
with its accuracy using default settings because tuning is performed by
applying inner cross-validation runs to find the best parameter value for each
training set occurring in the outer cross-validation—and the latter yields the final
performance estimate.
Exercise 17.4.12. What accuracy is obtained in each case? What value is
selected for the parameter-tuned version based on cross-validation on the full
data set? (Note: This value is output in the Classifier Output text area because,
as mentioned earlier, the model that is output is the one built from the full
dataset.)
Now consider parameter tuning for J48. If there is more than one parameter string
in the CVParameters field, CVParameterSelection performs a grid search on the
parameters simultaneously. The letter code for the pruning confidence parameter is
C, and you should evaluate values from 0.1 to 0.5 in five steps. The letter code for
the minimum leaf size parameter is M, and you should evaluate values from 1 to 10
in 10 steps.
Exercise 17.4.13. Run CVParameterSelection to find the best parameter value
setting. Compare the output you get to that obtained from J48 with default
parameters. Has accuracy changed? What about tree size? What parameter
values were selected by CVParameterSelection for the model built from the
full training set?
การแปล กรุณารอสักครู่..
