Association Analysis in Data Mining
gocourse.in Maintenance

We'll be back soon

Our CDN (cdn.gocourse.in) is currently unreachable. Some images, JavaScript, or CSS files may not load properly.

Estimated downtime: ~30 minutes

Association Analysis in Data Mining

kumudha

Association Analysis in Data Mining

What is Data Mining?

Data mining is the process of extracting useful information from large amounts of raw data.
Today, with the help of the internet, huge volumes of data are collected. Data mining helps
convert this raw data into meaningful insights that businesses can use to make better decisions.

What is Association Analysis?

Association Analysis (also called Market Basket Analysis) is a technique used to find relationships between items in a dataset.

It helps answer questions like:
“Which items are usually bought together?”

Basic Representation

Association rules are written as:
C ⇒ D
C (Antecedent) → Items already present
D (Consequent) → Items that are likely to appear with C

Simple Example

In a supermarket:

If customers often buy milk and bread together, we write:
{Milk} ⇒ {Bread}

This means:

If a customer buys milk, they are likely to buy bread too.

Business Use

Place related items near each other
Offer combo discounts
Increase sales

More Examples

{Diapers} ⇒ {Milk}
{Watch, Perfume} ⇒ {Ring}

Key Components of Association Analysis

1. Transactional Data

Each record of items purchased together is called a transaction.

Example:

Transaction 1 → Milk, Bread
Transaction 2 → Chips, Coke

2. Itemset

A group of one or more items.

Example:

{Milk, Honey} → 2-itemset

3. Frequent Patterns

Items that appear together frequently in transactions.

Example:

Butter & Jam often bought together

4. Sequential Patterns

Items bought in a specific order.

Example:

Buy Computer → then Antivirus

5. Association Rule

A rule that shows relationships:

IF {C} THEN {D}
Left side (C) → Condition
Right side (D) → Result

Important Measures in Association Analysis

6. Support

Support tells how often an item appears in the dataset.

Formula:

Support(C) = (Number of transactions containing C) / (Total transactions)

Example:

Transactions:
Biscuits, Chips, Coke
Biscuits, Chips
Bread, Butter
Biscuits, Coke
Biscuits appear in 3 out of 4 transactions:
Support(Biscuits) = 3/4 = 0.75 (75%)

7. Confidence

Confidence shows how strong the relationship is.

Formula:

Confidence(C ⇒ D) = Support(C ∩ D) / Support(C)

Example:

Rule: {Chips} ⇒ {Coke}
Support(Chips & Coke) = 2/4 = 0.5
Support(Chips) = 2/4 = 0.5
Confidence = 0.5 / 0.5 = 1 (100%)

This means:

Whenever chips are bought, coke is also bought.

8. Lift

Lift shows how strong the relationship is compared to random chance.

Formula:

Lift = Support(C ∩ D) / (Support(C) × Support(D))
Interpretation:
Lift > 1 → Positive relationship
Lift = 1 → No relation
Lift < 1 → Negative relationship

Example:

Support(Coke) = 3/4 = 0.75
Lift = 0.5 / (0.5 × 0.75) = 1.34
Items are positively related.

9. Apriori Algorithm

Apriori is used to find frequent itemsets and generate rules.

Key Idea:

If an itemset is frequent, all its subsets must also be frequent.

Example:

Transactions:
T1: A, B, C
T2: A, C, D
T3: B, C, D
T4: A, D, E
T5: B, C, E

Possible rules:
A ⇒ D
C ⇒ A
A ⇒ C

Apriori helps:

Find frequent combinations
Generate strong association rules

10. MinSupport

Minimum support threshold.
Only itemsets with support ≥ MinSupport are considered.

11. MinConfidence

Minimum confidence threshold.
Only rules with confidence ≥ MinConfidence are accepted.

12. Pruning

Pruning removes itemsets that do not meet minimum support.
  • Reduces computation
  • Makes algorithm efficient
Our website uses cookies to enhance your experience. Learn More
Accept !