Different Types of Clustering
Cluster Analysis is a technique used to divide data into groups called
clusters. Objects within the same cluster are similar to each other, while
objects in different clusters are different from one another.
The main purpose of clustering is to identify patterns and meaningful
groups in data. Sometimes, clustering is also used as an initial step for
data summarization or further analysis.
Cluster analysis plays an important role in many fields such as:
- Biology
- Psychology
- Statistics
- Pattern Recognition
- Machine Learning
- Data Mining
What is Cluster Analysis?
Cluster Analysis is the process of grouping data objects based on the
information present in the dataset.
The goal is:
- Objects within the same group should be similar.
- Objects from different groups should be dissimilar.
However, defining what exactly forms a cluster is not always simple.
The same set of data points can sometimes be grouped in different
ways depending on the method used. Therefore, the best clustering
structure depends on the nature of the data and the objective of the
analysis.
Clustering vs Classification
Clustering is sometimes compared with classification, but they are
different.
Classification
- Uses predefined class labels
- Requires training data
- It is a supervised learning method
Clustering
- Does not require predefined labels
- Groups data based on similarity
- It is an unsupervised learning method
Because of this, clustering is often called unsupervised
classification.
Other Terms Related to Clustering
Some terms are often used as alternatives for clustering:
Segmentation
Dividing data into groups using simple rules.
Example:
- Grouping people based on income
- Segmenting images based on color
Partitioning
Dividing a dataset into smaller parts or subsets
This term is also used in graph partitioning and other areas.
Different Types of Clustering
Clustering methods can be categorized into several types:
- Hierarchical vs Partitional Clustering
- Exclusive vs Overlapping vs Fuzzy Clustering
- Complete vs Partial Clustering
1. Hierarchical vs Partitional Clustering
Partitional Clustering
In partitional clustering, the dataset is divided into non-overlapping
clusters. Each data object belongs to only one cluster
Clusters do not contain subclusters
Example:
If we divide 100 customers into 5 groups, each customer belongs to only
one group.
Hierarchical Clustering
In hierarchical clustering, clusters are organized in a tree-like
structure.
- Clusters may contain subclusters
- The structure is called a hierarchy
The top level contains one cluster with all data objects, and as we
move down the tree, clusters are divided into smaller groups.
Important points:
- The root contains all objects
- The leaf nodes contain individual objects
- A hierarchical cluster can be converted into partitional clusters by cutting the tree at a specific level
2. Exclusive vs Overlapping vs Fuzzy Clustering
Exclusive Clustering
In exclusive clustering, each object belongs to only one cluster.
Example:
A student belongs to one class section only.
Overlapping Clustering
In overlapping clustering, an object can belong to multiple
clusters.
Example:
A person can be both an employee and a student trainee in a
company.
This approach is useful when objects naturally belong to more than one
group.
Fuzzy Clustering
In fuzzy clustering, objects belong to clusters with a membership value
between 0 and 1.
This means:
- The object belongs 70% to cluster A
- 30% to cluster B
The sum of membership values for each object is equal to 1.
Fuzzy clustering is useful when boundaries between clusters are
unclear.
3.Complete vs Partial Clustering
Complete Clustering
In complete clustering, every data object is assigned to a cluster.
Example:
Grouping all documents in a dataset into topics.
Partial Clustering
In partial clustering, some objects may not belong to any cluster.
These objects may be:
- Noise
- Outliers
- Irrelevant data
Example:
When analyzing news articles, only articles related to important topics
may be grouped into clusters, while others may be ignored
Different Types of Clusters
Different clustering methods define clusters in different ways.
Main cluster types include:
- Well-separated clusters
- Prototype-based clusters
- Graph-based clusters
- Density-based clusters
- Shared-property (Conceptual) clusters
1. Well-Separated Clusters
In this type, each object in a cluster is closer to other objects in
the same cluster than to objects in other clusters.
Characteristics:
- Clear separation between clusters
- Clusters can have any shape
- This type works well when clusters are far apart from each other.
2. Prototype-Based Clusters
In this approach, each cluster is represented by a
prototype.
The prototype can be:
Centroid
The average of all data points in the cluster.
Medoid
The most representative data point in the cluster.
Objects are assigned to the cluster whose prototype is closest to
them.
These clusters are often called center-based clusters and usually
have a spherical shape.
Example algorithms:
K-Means
K-Medoids
3. Graph-Based Clusters
In this method, data is represented as a graph:
- Nodes represent data objects
- Edges represent connections or similarity
A cluster is defined as a connected component in the graph.
One common example is contiguity-based clustering, where objects are
connected if they are within a certain distance.
A limitation of this method is that noise points may connect
different clusters, creating incorrect groupings.
4. Density-Based Clusters
In density-based clustering, clusters are defined as regions with
high density of points, separated by regions with low
density.
Characteristics:
- Can detect irregular-shaped clusters
- Handles noise and outliers
Example algorithm:
- DBSCAN
Density-based clustering works well when clusters are complex or
overlapping.
5. Shared-Property (Conceptual) Clusters
In this type, objects in a cluster share a common property or
concept.
Example:
- Documents discussing the same topic
- Products belonging to the same category
These clusters are discovered using conceptual clustering, which
focuses on understanding the meaning or concept behind the data
rather than just distance.