Overview on dataset and generated segmentations of the illustrative example
In order to illustrate the methodology, clustering was performed on the Asia Magazine dataset using the SAS Enterprise
Miner (EM) data mining software. This illustrative dataset contains business-related data (e.g. Number of Employees;
various measures of Sales, Assets, Equity, Profit) for 1996 on 1000 companies that was compiled by AsiaWeek
magazine.
In order to generate several segmentations we varied the following parameters: similarity measure (i.e. Least Squares,
Mean Absolute Deviation (MAD), Mid-Range, Newton, Modified Ekblom–Newton); Cubic Clustering Criterion (i.e. Ward,
Average, Centroid) for merging clusters using the Least Squares similarity measure; variable transformation method (i.e.
None, Range, Standard Deviation), and the number of clusters. For Least Squares we set the Minimum Number of Clusters
to 2 and Maximum Number of Clusters = 6 for the Number of Clusters parameter, after which the software automatically
selects the most appropriate number of clusters using the Cubic Clustering Criterion (CCC). The other similarity measures
do not allow the user to specify a range but rather a specific value for the Number of Clusters parameter (e.g. 2–5). It should
be noted that for some of these parameters that there are other options than the ones that we selected, but for illustrative
purpose it is not necessary to generate an exhaustive set.
Table 2 describes the parameter settings associated with the segmentations of our illustrative example. We did not report
all parameter settings for one of two reasons: (a) in some cases multiple parameter settings resulted in the same segmentations,
and so there was nothing to be gained by reporting on duplicates; (b) in other cases the segmentations were clearly
not intrinsically appropriate for the dataset, particularly for some of the cases when no variable transformation method (e.g.
Range, Standard Deviation) was used.