Categories of Functions in Data Mining
Introduction
Data mining functions help us discover patterns, trends, and relationships in data. These functions are mainly divided into two categories:
- Descriptive Data Mining
- Predictive Data Mining
1. Descriptive Data Mining
Descriptive data mining is used to understand what has happened in the data. It helps in finding patterns, relationships, and structures.
It answers questions like:
- What patterns exist in the data?
- Are there groups (clusters) of similar data?
- Are there any unusual data points (outliers)?
Techniques Used
1. Cluster Analysis
Groups similar data items together
Helps in segmentation and pattern discovery
Example:
Grouping customers based on buying behavior
2. Association Rule Mining
Finds relationships between variables
Identifies items that occur together
Example:
Customers who buy milk also buy bread
3. Data Visualization
Represents data using charts, graphs, etc.
Makes patterns easier to understand
2. Predictive Data Mining
Predictive data mining is used to predict future outcomes using past
data.
It answers questions like:
- Will a customer leave (churn)?
- What will be the future sales?
- Will a loan be repaid?
Techniques Used
1. Decision Trees
Predict outcomes based on input conditions
Used for classification problems
2. Neural Networks
Learn patterns automatically
Used in image recognition, speech processing, etc.
3. Regression Analysis
Predicts numerical values
Example: Predicting sales revenue
Key Point:
- Descriptive mining → Understand data
- Predictive mining → Forecast future
Both are important for better decision-making.
Data Mining Functionalities
1. Class / Concept Description
This function describes and summarizes data into meaningful groups.
Data Characterization
Summarizes features of a target group
Output: charts, graphs, summaries
Example:
Customers who spend more than ₹5,000/year are usually aged 40–50 with
good credit scores.
Data Discrimination
Compares two or more groups
Example:
Frequent buyers → Age 20–40, educated
Rare buyers → Young or elderly, less education
2. Mining Frequent Patterns, Associations, and Correlations
Frequent Patterns
These are commonly occurring patterns in data.
Frequent Itemset: Items bought together (e.g., milk & sugar)
Frequent Subsequence: Sequence of events (e.g., phone → case
purchase)
Frequent Substructure: Patterns in graphs or trees
Association Analysis
Finds relationships between data items.
Example Rule:
If a person buys a computer → may also buy software
Support: How often items occur together
Confidence: Probability of the rule being true
Correlation Analysis
Measures how strongly two variables are related
Example:
Height and weight are usually related.
Data Mining Task Primitives
These are basic building blocks of a data mining process.
1. Task-Relevant Data
Only selected data used for analysis
Example: Customer age, sales data
2. Type of Knowledge to be Mined
Defines what we want to find:
- Classification
- Clustering
- Prediction
- Association
3. Background Knowledge
Existing knowledge about the domain
Improves accuracy
Example: Industry rules or customer behavior patterns
4. Interestingness Measures
Helps decide which patterns are useful
Common measures:
- Support
- Confidence
- Utility
- Novelty
5. Data Visualization
Presents results using:
- Charts
- Graphs
- Tables
Makes insights easy to understand for everyone
Final Summary
Data mining has two main functions:
- Descriptive → Understand data
- Predictive → Predict future
It uses techniques like:
- Clustering
- Classification
- Association
Visualization and evaluation help make results useful and
actionable