Here’s how it works:
- Calculate the original pairwise distances or dissimilarities: This is typically done using some distance metric such as Euclidean distance or correlation coefficient, depending on the nature of the data.
- Perform hierarchical clustering: Apply a hierarchical clustering algorithm to the data to create a dendrogram, which represents the hierarchical structure of the clustering solution.
- Calculate the cophenetic distances: These are the distances between each pair of data points based on the dendrogram structure. This is done by measuring the height at which the two points’ branches merge in the dendrogram.
- Compute the correlation between original distances and cophenetic distances: Finally, the cophenetic coefficient is computed as the Pearson correlation coefficient between the original pairwise distances and the cophenetic distances. This coefficient ranges between -1 and 1, with 1 indicating perfect preservation of original distances and values closer to 0 indicating poor preservation.
A high cophenetic coefficient suggests that the hierarchical clustering solution accurately captures the underlying structure of the data in terms of pairwise dissimilarities. It is a useful measure for comparing different clustering solutions or for assessing the appropriateness of a particular clustering algorithm for a given dataset.