What is Binning in Data Mining
gocourse.in Maintenance

We'll be back soon

Our CDN (cdn.gocourse.in) is currently unreachable. Some images, JavaScript, or CSS files may not load properly.

Estimated downtime: ~30 minutes

What is Binning in Data Mining

Dhanapriya D


Binning in Data Mining

What is Binning?

Binning (also called data discretization or bucketing) is a data preprocessing technique used in data mining to reduce noise in data.

In this method, large sets of numerical data are divided into smaller groups called bins. All the values in each bin are then replaced with a representative value such as the mean, median, or boundary value.

This process smooths the data and helps improve the performance of data analysis and machine learning models.

Simple Explanation:

Binning groups similar values together into intervals.

For example, instead of storing individual ages like:
21, 23, 25, 27, 29

We can group them into bins such as:
20–25
26–30

This makes the data easier to analyze.

Why is Binning Used?

Binning is used for several reasons:
  • Reduce noise in the data
  • Simplify complex datasets
  • Improve model performance
  • Prevent overfitting, especially in small datasets
  • Convert numerical data into categorical data
  • Identify outliers or missing values

Purpose of Binning

The main purpose of binning is to reduce the number of distinct data values by grouping similar values together.

This helps in:
  • Faster data processing
  • Better visualization
  • Stronger relationships between variables in machine learning models

Binning in Image Processing

In image processing, binning refers to combining multiple pixels into a single larger pixel.

For example:
In 2 × 2 binning, four pixels are merged into one pixel.

Advantages:
  • Reduces the amount of image data
  • Improves image brightness
  • Reduces noise in images
Disadvantage:
  • Image resolution becomes lower.

Supervised Binning

Supervised binning is an advanced binning method used in machine learning.

In this method:
  • The bin boundaries are created using the target variable.
  • A decision tree is often used to determine the best bin divisions.
This helps improve prediction accuracy because it considers the relationship between input features and the target variable.

Example of Binning

A common example of binning is a Histogram.
A histogram groups data into intervals and shows how frequently values fall within each interval.

Example:

Marks of students:
45, 50, 52, 55, 60, 65, 70

Bins may be:
40–50
50–60
60–70

This helps visualize the distribution of marks.

Methods of Binning

There are two main methods used to divide data into bins.

1. Equal Frequency Binning

In this method, each bin contains the same number of data values.

Example:

Input data:
[5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]

Output bins:
Bin 1: [5, 10, 11, 13]
Bin 2: [15, 35, 50, 55]
Bin 3: [72, 92, 204, 215]

Each bin contains four values.

2. Equal Width Binning

In this method, each bin has the same range (width).
The bin width is calculated using the formula:

Example:

Input data:
[5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]

Output bins:
Bin 1: [5, 10, 11, 13, 15, 35, 50, 55, 72]
Bin 2: [92]
Bin 3: [204, 215]

Each bin covers the same range of values.

Implementation of Binning (Python Example)

Below is a simple Python program that demonstrates binning techniques.

# Equal Frequency Binning
def equifreq(arr1, m):
    a = len(arr1)
    n = int(a / m)
    for i in range(0, m):
        arr = []
        for j in range(i * n, (i + 1) * n):
            if j >= a:
                break
            arr.append(arr1[j])
        print(arr)

# Equal Width Binning
def equiwidth(arr1, m):
    w = int((max(arr1) - min(arr1)) / m)
    min1 = min(arr1)
    bins = []
    for i in range(0, m + 1):
        bins.append(min1 + w * i)
    result = []
    for i in range(0, m):
        temp = []
        for j in arr1:
            if j >= bins[i] and j <= bins[i + 1]:
                temp.append(j)
        result.append(temp)
    print(result)

# Data to be binned
data = [5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]

# Number of bins
m = 3
print("Equal Frequency Binning")
equifreq(data, m)
print("\nEqual Width Binning")
equiwidth(data, m)

Output:

Equal Frequency Binning
[5, 10, 11, 13]
[15, 35, 50, 55]
[72, 92, 204, 215]

Equal Width Binning
[[5, 10, 11, 13, 15, 35, 50, 55, 72], [92], [204, 215]]


Our website uses cookies to enhance your experience. Learn More
Accept !