Tasks and Functionalities of Data Mining
What is Data Mining?
Data mining is the process of automatically or semi-automatically analyzing
large datasets to discover useful patterns such as:
- Groups (clustering)
- Relationships (associations)
- Unusual data (outliers or anomalies)
- Sequences (patterns over time)
These patterns help in better decision-making and can be further used
in machine learning and predictive analytics.
Note: Data collection, cleaning, and reporting are not part of data
mining.
Data Mining vs Data Analysis
Many people confuse data mining with data analysis, but they are
different:
Data Mining
- Focuses on discovering hidden patterns using machine learning and statistical techniques.
Data Analysis
- Focuses on testing hypotheses and understanding data using statistical methods.
Types of Data Mining Tasks
1. Descriptive Data Mining
- Explains what is happening in the data
- Finds patterns without prior knowledge
- Examples: count, average, summary
2. Predictive Data Mining
Predicts future outcomes using past data
Examples:
- Predicting next quarter sales
- Detecting diseases based on medical data
Functionalities of Data Mining
These are the main operations used to discover patterns in
data:
1. Class / Concept Description
This helps in understanding and differentiating data.
a. Data Characterization
- Summarizes the features of a dataset
- Example: average sales, total customers
b. Data Discrimination
- Compares two or more groups
- Example: comparing buyers vs non-buyers
2. Mining Frequent Patterns
Finds commonly occurring patterns in data.
Frequent Itemset
Items often bought together (e.g., milk and bread)
Frequent Subsequence
Sequence of events (e.g., phone → phone cover)
Frequent Substructure
Patterns in complex data like trees or graphs
3. Association Analysis
Also known as Market Basket Analysis.
- Finds relationships between items
- Example: customers who buy coffee also buy biscuits
Key measures:
Support → How often items appear together
Confidence → Likelihood of one item appearing with another
4. Classification
Assigns data into predefined categories
Uses models like:
- Decision Trees
- If-then rules
- Neural Networks
Example: Classifying emails as spam or not spam
5. Prediction
Predicts missing or future values
Types:
- Numeric Prediction → Predict numbers (e.g., sales forecast using regression)
- Class Prediction → Predict categories (e.g., customer type)
6. Cluster Analysis
Groups similar data together
No predefined categories
Example:
Grouping customers based on buying behavior
7. Outlier Analysis
Identifies unusual or abnormal data
Why important:
- Helps detect errors or fraud
- Improves data quality
Example:
A sudden very high transaction in a bank account
8. Evolution and Deviation Analysis
Studies how data changes over time
Example:
- Sales trends over months
- Website traffic growth
9. Correlation Analysis
Measures the relationship between two variables
Example:
Increase in advertising → Increase in sales
It shows:
- How strongly variables are related
- Whether the relationship is positive or negative
Conclusion
Data mining helps in discovering hidden patterns and trends in large
datasets. These insights support better decision-making in areas like
business, healthcare, and finance.