What is an Outlier in Data Mining?

Balaji. K

What is an Outlier in Data Mining?

In data analysis, we often come across unusual data values called outliers. An outlier is a data point that is very different from the rest of the data in a dataset. It lies far away from the normal pattern or expected range of values.

Outliers can occur due to measurement errors, data entry mistakes, or natural variations in the data. During data analysis, it is important to identify these values because they may affect the accuracy of the results. In some cases, outliers are removed, while in other cases they are carefully analyzed because they may provide useful insights.

The concept of outliers was first formally defined by Frank E. Grubbs in 1969.

Difference Between Outliers and Noise

Noise refers to random errors or unwanted variations in measured data. It usually occurs due to problems in measurement, data collection, or transmission.

Outliers are extreme data points that significantly differ from the rest of the dataset.

Before detecting outliers, it is usually recommended to remove noise from the dataset, because noise can make outlier detection more difficult.

Types of Outliers

Outliers in data mining are generally classified into three types:

Global (Point) Outliers
Collective Outliers
Contextual (Conditional) Outliers

1. Global Outliers (Point Outliers)

Global outliers are the simplest type of outliers.

A global outlier occurs when a single data point is very different from all other data points in the dataset.

Most outlier detection methods in data mining focus on identifying this type of outlier.

Example:

If the average exam score of students is between 60 and 80, but one student scores 10, that value can be considered a global outlier.

2. Collective Outliers

A collective outlier occurs when a group of data points together behaves abnormally compared to the rest of the dataset.

In this case, individual data points may appear normal, but when considered as a group, they show unusual behavior.

Example:

In a network intrusion detection system, sending a few data packets from one computer may be normal. However, if many computers send a large number of packets at the same time, it may indicate a Denial-of-Service (DoS) attack. The group of packets together becomes a collective outlier.

3. Contextual Outliers (Conditional Outliers)

A contextual outlier occurs when a data point is considered unusual only within a specific context or condition.

These outliers depend on two types of attributes:

Contextual attributes – define the context (e.g., time, location)

Behavioral attributes – define the behavior of the data

Example:

A temperature of 45°C may be normal during summer, but it would be unusual during the rainy season or winter. Therefore, the same value can be normal or an outlier depending on the context.

Outlier Analysis

The process of identifying and studying unusual data points in a dataset is called Outlier

Analysis or Outlier Mining. It is an important task in data mining because rare events often provide valuable information.

Although outliers are sometimes removed from datasets, they are very useful in many real-world applications.

Applications of Outlier Detection

Outlier detection is widely used in several fields, such as:

Fraud detection in banking, credit cards, and insurance
Telecommunication fraud detection
Medical diagnosis and treatment analysis
Market analysis to understand unusual customer behavior
Network intrusion detection systems
Financial data monitoring

For example, in medical analysis, unusual patient responses to a treatment can be identified through outlier analysis.

« Previous Next »

What is an Outlier in Data Mining?

What is an Outlier in Data Mining?

Difference Between Outliers and Noise

Types of Outliers

1. Global Outliers (Point Outliers)

2. Collective Outliers

3. Contextual Outliers (Conditional Outliers)

Outlier Analysis

Applications of Outlier Detection

Translate

Related course

Social Plugin

Ads

Ads

Website by

Categories

Our Services

Footer Copyright

Contact form

What is an Outlier in Data Mining?

What is an Outlier in Data Mining?

Difference Between Outliers and Noise

Types of Outliers

1. Global Outliers (Point Outliers)

2. Collective Outliers

3. Contextual Outliers (Conditional Outliers)

Outlier Analysis

Applications of Outlier Detection

You may like these posts

Footer Copyright

Contact form