There's more…

To decide what clusters must be combined, it is necessary to define a measure of dissimilarity between the clusters. In most hierarchical clustering methods, specific metrics are used to quantify the distance between two pairs of elements, and a linking criterion that defines the dissimilarity of two sets of elements (clusters) as a function of the distance between pairs of elements in the two sets.

These common metrics are as follows:

  • The Euclidean distance
  • The Manhattan distance 
  • The uniform rule
  • The Mahalanobis distance, which corrects data by different scales and correlations in variables
  • The angle between the two vectors
  • The Hamming distance, which measures the minimum number of substitutions required to change one member into another