Grid-Based Method in Data Mining
gocourse.in Maintenance

We'll be back soon

Our CDN (cdn.gocourse.in) is currently unreachable. Some images, JavaScript, or CSS files may not load properly.

Estimated downtime: ~30 minutes

Grid-Based Method in Data Mining

Vinithra

Grid-Based Method in Data Mining  

The grid-based clustering method is a technique used in data mining to group data points. Instead of looking at each data point individually, it divides the entire data space into a fixed number of small regions called cells (or grids).

All clustering operations are performed on these grid cells, not directly on the data points. This makes the method very fast, because the processing time depends on the number of grid cells, not the number of data points

Key Idea: 

  • Divide data space into grids
  • Perform clustering on grids instead of individual points
  • Faster and efficient for large datasets 

Examples of Grid-Based Methods

Some popular grid-based approaches include: 

STING (Statistical Information Grid) – Uses statistical information stored in grid cells
WaveCluster – Uses wavelet transformation for clustering
CLIQUE – Works for high-dimensional data using grid and density concepts  

Basics of Grid-Based Methods

Grid-based methods are especially useful when working with multi-dimensional data, such as:
  • Geographic data
  • Image data
  • Data with many attributes 
Dividing data into grids provides several advantages: 

1. Data Partitioning

Data is divided into multiple groups (clusters) based on similarity 
Each partition represents a cluster
The number of clusters (K) is often predefined 

Examples of partitioning algorithms:

K-Means
PAM (K-Medoids)
CLARA 

2. Data Reduction

Reduces the size of large datasets 
Removes unnecessary or duplicate data 
Keeps only important information

This helps in faster processing and better performance. 

3. Local Pattern Discovery 

Helps find patterns within specific regions (cells)
Useful for identifying hidden trends in small areas
Good for detecting localized behaviors in data

4. Scalability

Works well with large datasets 
Efficient for high-dimensional data 
Grid structure simplifies complex data

5. Density Estimation

Groups are formed based on dense regions of data 
Sparse regions are treated as noise

Key concepts:

ε (epsilon) → Radius of neighborhood 
MinPts → Minimum number of points required

If a region has enough points → it forms a cluster
Otherwise → considered noise 

6. Clustering and Classification

Uses grid cells instead of individual data points
Improves processing speed
Efficient for large datasets

7. Grid-Based Indexing

Organizes data using grid structure
Helps in faster data retrieval and searching
Improves query performance

Popular Grid-Based Methods 

1. K-Means Clustering

K-Means is a simple and widely used clustering algorithm.

Key Points: 

  • It is an unsupervised learning algorithm 
  • The number of clusters (K) is predefined
  • Each cluster has a centroid (center point)
  • Goal: Minimize distance between points and their centroid 

How K-Means Works 

Step 1: Choose number of clusters (K) 
Step 2: Select K random centroids
Step 3: Assign each data point to nearest centroid
Step 4: Recalculate centroids
Step 5: Repeat steps 3 & 4 until no change

2. DBSCAN (Density-Based Clustering)

DBSCAN groups data based on density.

Key Concepts:

Eps (ε): Radius of neighborhood
MinPts: Minimum number of points required

Types of Points: 

Core Point: Has enough neighbors
Border Point: Close to core point
Noise: Outliers
Important Condition:

A point is a core point if:

Number of points in its ε-neighborhood ≥ MinPts

3. STING (Statistical Information Grid)

STING divides space into multiple grid levels.

Key Features: 

  • Uses rectangular grid cells
  • Each cell stores statistical values like: 
  • Mean
  • Min
  • Max
  • Works in a hierarchical structure (top → bottom levels) 

How STING Works

Step 1: Select a starting layer
Step 2: Calculate probability for each cell
Step 3: Mark cells as relevant or not
Step 4: Move to lower level if needed
Step 5: Repeat for relevant cells
Step 6: Retrieve data from selected cells
Step 7: Return final result  

Summary

  • Grid-based methods divide data into cells 
  • They are fast and efficient for large datasets 
  • Work well for multi-dimensional data 
  • Popular methods include K-Means, DBSCAN, and STING 

 

Our website uses cookies to enhance your experience. Learn More
Accept !