SUMMARY: Measures of distance between samples: Euclidean
1. Pythagoras’ theorem extends to vectors in multidimensional space: the squared length
of a vector is the sum of squares of its coordinates.
2. As a consequence, squared distances between two vectors in multidimensional space
are the sum of squared differences in their coordinates. This multidimensional
distance is called the Euclidean distance, and is the natural generalization of our threedimensional
notion of physical distance to more dimensions.
3. When variables are on different measurement scales, standardization is necessary to
balance the contributions of the variables in the computation of distance. The
Euclidean distance computed on standardized variables is called the standardized
Euclidean distance.
4. Standardization in the calculation of distances is equivalently thought of as weighting
the variables – this leads to the notion of Euclidean distances with any choice of
weights, called weighted Euclidean distance.
5. A particular weighted Euclidean distance applicable to count data is the chi-square
distance, which is calculated between the relative counts for each sample, called
profiles, and weights each variable by the inverse of the variable’s overall mean count.