Class Comparison Methods in Data Mining

Balaji. K

Class Comparison Methods in Data Mining

In data mining, users do not always want to study just one group (or class). Instead, they often want to compare one class with another to understand the differences between them.

This process is called class comparison (or class discrimination). It helps to find patterns that distinguish a target class from other similar classes.

Important point: The classes being compared must be similar in structure (i.e., they should have the same type of attributes).

For example:

Valid comparison: Computer Science students vs Physics students
Invalid comparison: Person vs Address vs Item (not comparable)

Class Comparison vs Class Characterization

Class Characterization → Describes a single class
Class Comparison → Compares two or more classes

Example:

Comparing sales in 2003 vs sales in 2004 is a class comparison.

Synchronous Generalization

To make fair comparisons, data must be generalized to the same level of detail.

Example:

If we compare sales data:

Both datasets should be at the same location level:

City level OR
State level OR
Country level

Wrong comparison:

Sales in Vancouver (city) vs Sales in USA (country)

Correct comparison:

Both should be at the same level (e.g., both at country level)

Users can also manually adjust these levels if needed

Steps in Class Comparison

1. Data Collection

Collect relevant data using queries.

Divide data into:

Target class
Contrasting class(es)

2. Dimension Relevance Analysis

If there are many attributes, select only the important ones.
This improves accuracy and reduces complexity. Cross-tabulations

3. Synchronous Generalization

Generalize the target class data to a chosen level.

Apply the same level of generalization to contrasting classes.

4. Presentation of Results

Show the comparison using:

Tables
Charts (bar, pie, etc.)
Rules

A common measure used is count%, which shows the proportion of data in each class.

Example (DMQL Query)

Comparing graduate vs undergraduate students:

use University_Database

mine comparison as "graduate_students vs_undergraduate_students"

in relevance to name, gender, program, birth_place, birth_date, residence, phone_no, GPA

for "graduate_students"

where status in "graduate"

versus "undergraduate_students"

where status in "undergraduate"

analyze count%

from student

Key Terms Explained

Attributes → Data fields (e.g., name, gender, GPA)
Concept Hierarchy (Gen(ai)) → Levels of data abstraction
Thresholds (Ui, Ti) → Limits used for analysis and generalization
Relevance Threshold (R) → Determines important attributes

Presentation of Class Comparison

Results can be shown using:

Tables
Charts (bar chart, pie chart, curves)
Cross-tabulations
Rules

Discriminant Rules

Class comparison results are often expressed using discriminant rules.

These rules:

Highlight differences between classes
Use a measure called d-weight
Show how strongly a feature distinguishes one class from another

« Previous Next »

Class Comparison Methods in Data Mining

Class Comparison Methods in Data Mining

Synchronous Generalization

Both datasets should be at the same location level:

Wrong comparison:

Correct comparison:

Steps in Class Comparison

1. Data Collection

2. Dimension Relevance Analysis

3. Synchronous Generalization

4. Presentation of Results

Example (DMQL Query)

Key Terms Explained

Presentation of Class Comparison

Discriminant Rules

Translate

Related course

Social Plugin

Ads

Ads

Website by

Categories

Our Services

Footer Copyright

Contact form

Class Comparison Methods in Data Mining

Class Comparison Methods in Data Mining

Synchronous Generalization

Both datasets should be at the same location level:

Wrong comparison:

Correct comparison:

Steps in Class Comparison

1. Data Collection

2. Dimension Relevance Analysis

3. Synchronous Generalization

4. Presentation of Results

Example (DMQL Query)

Key Terms Explained

Presentation of Class Comparison

Discriminant Rules

You may like these posts

Footer Copyright

Contact form