Hierarchical Clustering in Data Mining

Balaji. K

Hierarchical Clustering in Data Mining

Hierarchical clustering is an unsupervised learning technique used to group similar data points into clusters. It organizes the data in a tree-like structure called a dendrogram. In this method, clusters are formed step by step based on the similarity between data points. At the beginning, each data point is treated as a separate cluster. Gradually, similar clusters are merged together or split apart depending on the clustering method used.

Hierarchical clustering mainly has two types:

Agglomerative Hierarchical Clustering
Divisive Hierarchical Clustering

1. Agglomerative Hierarchical Clustering

Agglomerative clustering is the most commonly used hierarchical clustering technique. It follows a bottom-up approach, where each data point starts as an individual cluster and similar clusters are gradually merged together.

This method is also called AGNES (Agglomerative Nesting). At every step, the two clusters that are most similar are combined until all data points form one single cluster.

Algorithm for Agglomerative Hierarchical Clustering

Calculate the similarity or distance between all data points (create a proximity matrix).
Consider each data point as an individual cluster.
Find the two clusters that are most similar.
Merge those clusters into a single cluster.
Recalculate the distances between the new cluster and other clusters.
Repeat steps 3–5 until only one cluster remains.

Example Explanation

Assume we have six data points:

P, Q, R, S, T, V

Step 1:

Each point is treated as a separate cluster:

(P), (Q), (R), (S), (T), (V)

Step 2:

The closest clusters are merged. Suppose Q and R are similar.

Clusters become:

(P), (QR), (S), (T), (V)

Step 3:

Next, S and T are merged.

Clusters become:

(P), (QR), (ST), (V)

Step 4:

Clusters ST and V are merged.

Clusters become:

(P), (QR), (STV)

Step 5:

Clusters QR and STV are merged.

Clusters become:(P), (QRSTV)

Step 6:

Finally, P merges with QRSTV to form one cluster:

(PQRSTV)

This merging process can be visualized using a dendrogram diagram.

2. Divisive Hierarchical Clustering

Divisive hierarchical clustering works in the opposite way compared to agglomerative clustering.

It follows a top-down approach:

Initially, all data points are placed in one cluster.

The algorithm then splits the cluster into smaller clusters based on dissimilarity.
This splitting continues until each data point becomes its own cluster or the desired number of clusters is reached a dendrogram diagram.

Advantages of Hierarchical Clustering

Easy to understand and implement.
Produces a hierarchical structure (dendrogram) which provides more information about data relationships.
The number of clusters does not need to be specified in advance.

Disadvantages of Hierarchical Clustering

Not suitable for very large datasets.
Difficult to handle clusters with different sizes and shapes.
Sensitive to noise and outliers.
Once clusters are merged or split, the process cannot be reversed.

« Previous Next »

Hierarchical Clustering in Data Mining

Hierarchical Clustering in Data Mining

Hierarchical clustering mainly has two types:

1. Agglomerative Hierarchical Clustering

Algorithm for Agglomerative Hierarchical Clustering

Example Explanation

2. Divisive Hierarchical Clustering

Advantages of Hierarchical Clustering

Disadvantages of Hierarchical Clustering

Translate

Related course

Social Plugin

Ads

Ads

Website by

Categories

Our Services

Footer Copyright

Contact form

Hierarchical Clustering in Data Mining

Hierarchical Clustering in Data Mining

Hierarchical clustering mainly has two types:

1. Agglomerative Hierarchical Clustering

Algorithm for Agglomerative Hierarchical Clustering

Example Explanation

2. Divisive Hierarchical Clustering

Advantages of Hierarchical Clustering

Disadvantages of Hierarchical Clustering

You may like these posts

Footer Copyright

Contact form