Figure 2 also lists the number of continuous and nominal input attributes for each database. Note that the
accuracy for every application
that has only numeric attributes
is exactly the same for both RBF
and HRBF. This is no surprise,
since the distance functions are
equivalent on numeric attributes.
However, on the databases
that have some or all nominal
attributes, HRBF obtained
higher generalization accuracy
than RBF in 12 out of 23 cases,
10 of which were significant at
the 95% level or above. RBF
had a higher accuracy in only
four cases, and only one of those
(the Zoo data set) had a
difference that was statistically
significant.
It is interesting to note that
in the Zoo data set, 15 out of 16
of the attributes are boolean, and
the remaining attribute, while
not linear, is actually an ordered
attribute. These attributes are
tagged as nominal, but the
Euclidean distance function is
appropriate for them as well.
In all, HRBF performed as
well or better than the default
algorithm in 26 out of 30 cases.
The above results indicate
that the heterogeneous distance
function is typically more
appropriate than the Euclidean
distance function for applications
with one or more nominal
attributes, and is equivalent to it
for domains without nominal
attributes.