What is an Outlier in Data Mining?
gocourse.in Maintenance

We'll be back soon

Our CDN (cdn.gocourse.in) is currently unreachable. Some images, JavaScript, or CSS files may not load properly.

Estimated downtime: ~30 minutes

What is an Outlier in Data Mining?

kumudha

 What is an Outlier in Data Mining?

In data analysis, we often come across unusual data values called outliers. An outlier is a data point that is very different from the rest of the data in a dataset. It lies far away from the normal pattern or expected range of values.

Outliers can occur due to measurement errors, data entry mistakes, or natural variations in the data. During data analysis, it is important to identify these values because they may affect the accuracy of the results. In some cases, outliers are removed, while in other cases they are carefully analyzed because they may provide useful insights.

The concept of outliers was first formally defined by Frank E. Grubbs in 1969.

Difference Between Outliers and Noise

  • Noise refers to random errors or unwanted variations in measured data. It usually occurs due to problems in measurement, data collection, or transmission.
  • Outliers are extreme data points that significantly differ from the rest of the dataset.
Before detecting outliers, it is usually recommended to remove noise from the dataset, because noise can make outlier detection more difficult.

Types of Outliers

Outliers in data mining are generally classified into three types:
  • Global (Point) Outliers
  • Collective Outliers
  • Contextual (Conditional) Outliers

1. Global Outliers (Point Outliers)

Global outliers are the simplest type of outliers.
A global outlier occurs when a single data point is very different from all other data points in the dataset.

Most outlier detection methods in data mining focus on identifying this type of outlier.

Example:

If the average exam score of students is between 60 and 80, but one student scores 10, that value can be considered a global outlier.

2. Collective Outliers

A collective outlier occurs when a group of data points together behaves abnormally compared to the rest of the dataset.

In this case, individual data points may appear normal, but when considered as a group, they show unusual behavior.

Example:

In a network intrusion detection system, sending a few data packets from one computer may be normal. However, if many computers send a large number of packets at the same time, it may indicate a Denial-of-Service (DoS) attack. The group of packets together becomes a collective outlier.

3. Contextual Outliers (Conditional Outliers)

A contextual outlier occurs when a data point is considered unusual only within a specific context or condition.

These outliers depend on two types of attributes:
Contextual attributes – define the context (e.g., time, location)

Behavioral attributes – define the behavior of the data

Example:

A temperature of 45°C may be normal during summer, but it would be unusual during the rainy season or winter. Therefore, the same value can be normal or an outlier depending on the context.

Outlier Analysis

The process of identifying and studying unusual data points in a dataset is called Outlier Analysis or Outlier Mining. It is an important task in data mining because rare events often provide valuable information.

Although outliers are sometimes removed from datasets, they are very useful in many real-world
applications.

Applications of Outlier Detection

Outlier detection is widely used in several fields, such as:
  • Fraud detection in banking, credit cards, and insurance
  • Telecommunication fraud detection
  • Medical diagnosis and treatment analysis
  • Market analysis to understand unusual customer behavior
  • Network intrusion detection systems
  • Financial data monitoring
For example, in medical analysis, unusual patient responses to a treatment can be identified through outlier analysis.


Our website uses cookies to enhance your experience. Learn More
Accept !