17.4 PREPROCESSING AND PARAMETER TUNING
Now we look at some useful preprocessing techniques, which are implemented as
filters, as well as a method for automatic parameter tuning.
Discretization
As we know, there are two types of discretization techniques: unsupervised ones,
which are “class blind,” and supervised ones, which take the class value of the
instances into account when creating intervals. Weka’s main unsupervised method
for discretizing numeric attributes is weka.filters.unsupervised.attribute.Discretize.
It implements these two methods: equal-width (the default) and equal-frequency
discretization.
Find the glass dataset glass.arff and load it into the Explorer interface. Apply the
unsupervised discretization filter in the two different modes explained previously.
Exercise 17.4.1. What do you observe when you compare the histograms
obtained? The one for equal-frequency discretization is quite skewed for some
attributes. Why?
The main supervised technique for discretizing numeric attributes is weka.filters.
supervised.attribute.Discretize. Locate the iris data, load it, apply the supervised
discretization scheme, and look at the histograms obtained. Supervised discretization
strives to create intervals within which the class distribution is consistent, although
the distributions vary from one interval to the next.
Exercise 17.4.2. Based on the histograms obtained, which of the discretized
attributes would you consider to be most predictive? Reload the glass data and
apply supervised discretization to it.
Exercise 17.4.3. For some attributes there is only a single bar in the histogram.
What does that mean?
Discretized attributes are normally coded as nominal attributes, with one value
per range. However, because the ranges are ordered, a discretized attribute is actually
on an ordinal scale. Both filters have the ability to create binary attributes rather than
multivalued ones, by setting the option makeBinary to true.
17.4 PREPROCESSING AND PARAMETER TUNING
Now we look at some useful preprocessing techniques, which are implemented as
filters, as well as a method for automatic parameter tuning.
Discretization
As we know, there are two types of discretization techniques: unsupervised ones,
which are “class blind,” and supervised ones, which take the class value of the
instances into account when creating intervals. Weka’s main unsupervised method
for discretizing numeric attributes is weka.filters.unsupervised.attribute.Discretize.
It implements these two methods: equal-width (the default) and equal-frequency
discretization.
Find the glass dataset glass.arff and load it into the Explorer interface. Apply the
unsupervised discretization filter in the two different modes explained previously.
Exercise 17.4.1. What do you observe when you compare the histograms
obtained? The one for equal-frequency discretization is quite skewed for some
attributes. Why?
The main supervised technique for discretizing numeric attributes is weka.filters.
supervised.attribute.Discretize. Locate the iris data, load it, apply the supervised
discretization scheme, and look at the histograms obtained. Supervised discretization
strives to create intervals within which the class distribution is consistent, although
the distributions vary from one interval to the next.
Exercise 17.4.2. Based on the histograms obtained, which of the discretized
attributes would you consider to be most predictive? Reload the glass data and
apply supervised discretization to it.
Exercise 17.4.3. For some attributes there is only a single bar in the histogram.
What does that mean?
Discretized attributes are normally coded as nominal attributes, with one value
per range. However, because the ranges are ordered, a discretized attribute is actually
on an ordinal scale. Both filters have the ability to create binary attributes rather than
multivalued ones, by setting the option makeBinary to true.
การแปล กรุณารอสักครู่..