Class Comparison Methods in Data Mining
gocourse.in Maintenance

We'll be back soon

Our CDN (cdn.gocourse.in) is currently unreachable. Some images, JavaScript, or CSS files may not load properly.

Estimated downtime: ~30 minutes

Class Comparison Methods in Data Mining

Balaji. K

 Class Comparison Methods in Data Mining

In data mining, users do not always want to study just one group (or class). Instead, they often want to compare one class with another to understand the differences between them.

This process is called class comparison (or class discrimination). It helps to find patterns that distinguish a target class from other similar classes.

Important point: The classes being compared must be similar in structure (i.e., they should have the same type of attributes).

For example:
  •  Valid comparison: Computer Science students vs Physics students
  •  Invalid comparison: Person vs Address vs Item (not comparable)
Class Comparison vs Class Characterization
  •  Class Characterization → Describes a single class
  •  Class Comparison → Compares two or more classes
Example:
Comparing sales in 2003 vs sales in 2004 is a class comparison.

Synchronous Generalization

To make fair comparisons, data must be generalized to the same level of detail.

Example:
If we compare sales data:

Both datasets should be at the same location level:
  •  City level OR
  •  State level OR
  •  Country level
Wrong comparison:
  •  Sales in Vancouver (city) vs Sales in USA (country)
Correct comparison:
  •  Both should be at the same level (e.g., both at country level)
Users can also manually adjust these levels if needed

Steps in Class Comparison

1. Data Collection
Collect relevant data using queries.

Divide data into:
  •  Target class
  •  Contrasting class(es)
2. Dimension Relevance Analysis
  • If there are many attributes, select only the important ones.
  • This improves accuracy and reduces complexity. Cross-tabulations
3. Synchronous Generalization
Generalize the target class data to a chosen level.
Apply the same level of generalization to contrasting classes.

4. Presentation of Results

Show the comparison using:
  •  Tables
  •  Charts (bar, pie, etc.)
  •  Rules
A common measure used is count%, which shows the proportion of data in each class.

Example (DMQL Query)

Comparing graduate vs undergraduate students:

use University_Database
mine comparison as "graduate_students vs_undergraduate_students"
in relevance to name, gender, program, birth_place, birth_date, residence, phone_no, GPA
for "graduate_students"
where status in "graduate"
versus "undergraduate_students"
where status in "undergraduate"
analyze count%
from student

Key Terms Explained

  •  Attributes → Data fields (e.g., name, gender, GPA)
  •  Concept Hierarchy (Gen(ai)) → Levels of data abstraction
  •  Thresholds (Ui, Ti) → Limits used for analysis and generalization
  •  Relevance Threshold (R) → Determines important attributes

Presentation of Class Comparison

Results can be shown using:
  •  Tables
  •  Charts (bar chart, pie chart, curves)
  •  Cross-tabulations
  •  Rules
Discriminant Rules

Class comparison results are often expressed using discriminant rules.

These rules:
  •  Highlight differences between classes
  •  Use a measure called d-weight
  •  Show how strongly a feature distinguishes one class from another
Our website uses cookies to enhance your experience. Learn More
Accept !