Association Analysis in Data Mining
What is Data Mining?
Data mining is the process of extracting useful information from large
amounts of raw data.
Today, with the help of the internet, huge volumes of data are collected.
Data mining helps
convert this raw data into meaningful insights that businesses can use to
make better decisions.
What is Association Analysis?
Association Analysis (also called Market Basket Analysis) is a technique
used to find relationships between items in a dataset.
It helps answer questions like:
“Which items are usually bought together?”
Basic Representation
Association rules are written as:
C ⇒ D
C (Antecedent) → Items already present
D (Consequent) → Items that are likely to appear with C
Simple Example
In a supermarket:
If customers often buy milk and bread together, we write:
{Milk} ⇒ {Bread}
This means:
If a customer buys milk, they are likely to buy bread too.
Business Use
Place related items near each other
Offer combo discounts
Increase sales
More Examples
{Diapers} ⇒ {Milk}
{Watch, Perfume} ⇒ {Ring}
Key Components of Association Analysis
1. Transactional Data
Each record of items purchased together is called a transaction.
Example:
Transaction 1 → Milk, Bread
Transaction 2 → Chips, Coke
2. Itemset
A group of one or more items.
Example:
{Milk, Honey} → 2-itemset
3. Frequent Patterns
Items that appear together frequently in transactions.
Example:
Butter & Jam often bought together
4. Sequential Patterns
Items bought in a specific order.
Example:
Buy Computer → then Antivirus
5. Association Rule
A rule that shows relationships:
IF {C} THEN {D}
Left side (C) → Condition
Right side (D) → Result
Important Measures in Association Analysis
6. Support
Support tells how often an item appears in the dataset.
Formula:
Support(C) = (Number of transactions containing C) / (Total
transactions)
Example:
Transactions:
Biscuits, Chips, Coke
Biscuits, Chips
Bread, Butter
Biscuits, Coke
Biscuits appear in 3 out of 4 transactions:
Support(Biscuits) = 3/4 = 0.75 (75%)
7. Confidence
Confidence shows how strong the relationship is.
Formula:
Confidence(C ⇒ D) = Support(C ∩ D) / Support(C)
Example:
Rule: {Chips} ⇒ {Coke}
Support(Chips & Coke) = 2/4 = 0.5
Support(Chips) = 2/4 = 0.5
Confidence = 0.5 / 0.5 = 1 (100%)
This means:
Whenever chips are bought, coke is also bought.
8. Lift
Lift shows how strong the relationship is compared to random chance.
Formula:
Lift = Support(C ∩ D) / (Support(C) × Support(D))
Interpretation:
Lift > 1 → Positive relationship
Lift = 1 → No relation
Lift < 1 → Negative relationship
Example:
Support(Coke) = 3/4 = 0.75
Lift = 0.5 / (0.5 × 0.75) = 1.34
Items are positively related.
9. Apriori Algorithm
Apriori is used to find frequent itemsets and generate rules.
Key Idea:
If an itemset is frequent, all its subsets must also be frequent.
Example:
Transactions:
T1: A, B, C
T2: A, C, D
T3: B, C, D
T4: A, D, E
T5: B, C, E
Possible rules:
A ⇒ D
C ⇒ A
A ⇒ C
Apriori helps:
Find frequent combinations
Generate strong association rules
10. MinSupport
Minimum support threshold.
Only itemsets with support ≥ MinSupport are considered.
11. MinConfidence
Minimum confidence threshold.
Only rules with confidence ≥ MinConfidence are accepted.
12. Pruning
Pruning removes itemsets that do not meet minimum support.
- Reduces computation
- Makes algorithm efficient