Attribute Selection Measures in Data Mining

shareef

« Previous Next »

Attribute Selection Measures in Data Mining

In this article, we will learn about attribute selection measures in a simple way.

What is Attribute Selection?

Attribute selection is also known as:

Feature selection
Variable selection

It is an important concept in data mining, especially when building decision trees.

Attributes (or features) are the columns in a dataset. Sometimes, datasets contain:

Irrelevant data
Duplicate information
Noise (unwanted data)

These can reduce the performance of a model and make learning difficult.

Attribute selection helps by:

Removing unnecessary data
Improving model accuracy
Reducing overfitting
Making the model easier to understand

Why is it Important in Decision Trees?

In a decision tree, we need to decide:

Which attribute should be used first (root node)
How to split the data at each step

Attribute selection measures help us choose the best attribute for splitting the data.

Types of Attribute Selection Measures

There are three main measures:

1. Entropy

Entropy measures the impurity or disorder in a dataset.

High entropy → Data is mixed (impure)

Low entropy → Data is uniform (pure)

2. Information Gain

Information Gain tells us how much entropy decreases after splitting the dataset.

In simple words:

It shows how useful an attribute is for classification.

Formula:

IG(D,A)=H(D)−∑v Dv/D H(Dv)

Where:

H(D) = Entropy of dataset

𝐷𝑣 = subset after split

∣𝐷𝑣|= size of subset

Rule:

Higher Information Gain = Better attribute

Example:

For attribute Gender:

IG ≈ 0.000

This means Gender is not useful for splitting.

3. Gini Index

Gini Index measures impurity like entropy but in a different way.

Formula:

Gini(D)=1−∑i=1n(pi)2

Value Range:

0 → Pure dataset

1 → Completely impure

0.5 → Balanced but impure

Example:

For:

6 Yes, 4 No

Gini = 0.48

Indicates moderate impurity.

Attribute selection measures help us choose the best feature when building a decision tree.

Entropy → Measures disorder
Information Gain → Measures improvement after split
Gini Index → Measures impurity

Using these methods:

Improves model accuracy
Makes decision trees more efficient
Helps in better decision-making

« Previous Next »

Attribute Selection Measures in Data Mining

Attribute Selection Measures in Data Mining

What is Attribute Selection?

Why is it Important in Decision Trees?

Types of Attribute Selection Measures

1. Entropy

2. Information Gain

3. Gini Index

Attribute selection measures help us choose the best feature when building a decision tree.

Using these methods:

Translate

Related course

Social Plugin

Ads

Ads

Website by

Categories

Our Services

Footer Copyright

Contact form

Attribute Selection Measures in Data Mining

Attribute Selection Measures in Data Mining

What is Attribute Selection?

Why is it Important in Decision Trees?

Types of Attribute Selection Measures

1. Entropy

2. Information Gain

3. Gini Index

Attribute selection measures help us choose the best feature when building a decision tree.

Using these methods:

You may like these posts

Footer Copyright

Contact form