Statistical analyses
Cluster analysis was used to identify distinct health behaviour
clusters. This multivariate analysis can be useful for finding
homogeneous subgroups within heterogeneous samples.32
The procedure employed was in accordance with the most
recent developments in cluster analysis.33 As the precise
number of identifiable clusters was not known a priori, Ward
agglomerative hierarchical clustering was used as it is particularly
suitable for binary data.34,35 First, the Ward method
treated each individual observation as its own cluster. These
clusters were gradually agglomerated to one large cluster on
the basis of a proximity measure using a predefined fusion
algorithm.32 To enable identification of robust groups of
observations, the fusion algorithm was stopped at the point
where the individual clusters were as homogenous as possible
within clusters and as heterogeneous as possible in relation
to all the other clusters.34,36 The established measures R2,
semi-partial R2, pseudo F and pseudo t2-statistics were used as
the criteria for decisions regarding the total number of
clusters. Finally, root mean square standard deviation
(RMSSTD) was calculated as a measure of homogeneity.
A post hoc analysis looked at whether the groups identified
by cluster analysis could be characterized on the basis of
members’ specific social attributes. A multinomial logistic
regression with stepwise selection was used for this purpose.37
Therefore, the odds of cluster membership were modelled for
one social attribute while the other factors were held constant.
The coefficients thus calculated can be interpreted as changes
in the membership probability of the analysed cluster vs. the
reference category (Cluster 1).
In accordance with standard statistical procedure, bivariate
and multivariate analyses were only done on full datasets
(n = 1889). All tests were two-tailed at a level of significance of
P0.05. The analyses were conducted using the statistical
program SAS 9.1.