Apriori Algorithm
gocourse.in Maintenance

We'll be back soon

Our CDN (cdn.gocourse.in) is currently unreachable. Some images, JavaScript, or CSS files may not load properly.

Estimated downtime: ~30 minutes

Apriori Algorithm

shareef

 Apriori Algorithm

The Apriori Algorithm is a popular algorithm in data mining used to find relationships betweenitems ina dataset. It helps identify patterns showing which items are frequently purchasedtogether.

For example, in a supermarket, customers who buy pizza often buy soft drinks and breadsticksas well. Because of this pattern, shops create combo offers. This makes shopping easier forcustomers and increases store sales.

Similarly, in large stores like Big Bazaar, products such as biscuits, chips, and chocolates areoften placed together because customers usually buy them together. These relationships arediscovered using association rules, and the Apriori algorithm helps find these rules.


What is Apriori Algorithm?

The Apriori Algorithm is used to identify frequent itemsets (groups of items that appear togetherfrequently in transactions) and to generate association rules from those itemsets.

It works on large databases containing many transactions, such as customer purchase records.

For example, if many customers buy biscuits and chocolates together, the algorithm identifiesthispattern and stores can use it for product placement or recommendations.

Components of Apriori Algorithm

The Apriori algorithm mainly uses three important measures:
  • Support
  • Confidence
  • Lift
Let us understand them using an example.

Suppose a supermarket has 4000 transactions.

  • 400 transactions include Biscuits
  • 600 transactions include Chocolate
  • 200 transactions include both Biscuits and Chocolate

1. Support

Support shows how frequently an item appears in the dataset.

Formula
Support = (Number of transactions containing the item) / (Total number of transactions)

Example:
Support (Biscuits)
= 400 / 4000
= 10%

This means 10% of all transactions contain biscuits.

2. Confidence

Confidence measures how often items are purchased together.

Formula
Confidence = (Transactions containing both items) / (Transactions containing the first item)

Example:
Confidence (Biscuits → Chocolate)
= 200 / 400
= 50%

This means 50% of customers who bought biscuits also bought chocolate.

3. Lift

Lift measures how strongly two items are related.

Formula
Lift = Confidence / Support

Example:
Lift = 50 / 10 = 5

This means customers are 5 times more likely to buy biscuits and chocolate together thanbuying biscuits alone.

Interpretation:
Lift = 1 → No relationship
Lift > 1 → Positive relationship
Lift < 1 → Negative relationship

History of Apriori Algorithm

The Apriori Algorithm was introduced in 1994 by Rakesh Agrawal and Ramakrishnan Srikant.

The name Apriori comes from the idea of using prior knowledge of frequent itemsets to findlarger patterns.

The algorithm first finds frequent k-itemsets, and then uses them to generate (k+1)itemsets.

Applications of Apriori Algorithm


The Apriori algorithm is used in many fields.

1. Mobile E-Commerce

Online shopping platforms use it to recommend products frequently bought together,improvingcustomer experience and increasing sales.

2. Education

Educational institutions analyze student data such as grades, performance, and demographicinformation.

3. Forestry

It helps analyze and manage environmental data related to plants and wildlife.

4. Medical Field

Hospitals use it to analyze patient records and identify patterns in medical data.

5. Market Basket Analysis

Retail stores analyze customer purchase patterns to understand which products are boughttogether.

6. Website Design

It helps analyze user navigation patterns to improve website structure and user experience.

7. Tourism Industry

Tour companies analyze booking patterns to understand tourist preferences.

How Apriori Algorithm Works

Let us consider a simple example.

Products:

P = {Rice, Pulse, Oil, Milk, Apple}
The database contains several transactions showing which products were purchased.

Assumptions of Apriori Algorithm
  • All subsets of a frequent itemset must also be frequent.
  • If an itemset is infrequent, all its supersets will also be infrequent.
  • A minimum support threshold is set.
Assume minimum support = 50%.

Step 1:
Find Frequent Single Items

Create a frequency table.

Product    Frequency
Rice                4
Pulse              5
Oil                  4
Milk               4

Only items with support above the threshold are selected.

Step 2:
Create Item Pairs

Possible pairs:

RP, RO, RM, PO, PM, OM

Itemset     Frequency
RP                  4
RO                 3
RM                 2
PO                  4
PM                 3
OM                 2

Step 3:
Apply Support Threshold

Frequent pairs:
RP

RO
PO
PM

Step 4:
Generate 3-Itemsets

Possible combinations:
RPO
POM

Step 5:
Calculate Frequency

Itemset Frequency
RPO 4
POM 3

The frequent itemset is RPO.

Improving Apriori Efficiency

Some techniques improve performance.

1. Hash-Based Itemset Counting

Uses hashing to reduce the number of candidate itemsets.

2. Transaction Reduction

Transactions that do not contain frequent itemsets are removed from further analysis.

Finding Association Rules

To find association rules:

1. Brute Force Method

Analyze all possible rules and calculate support and confidence.

2. Two-Step Approach


Step 1:
Find frequent itemsets.

Step 2:
Generate association rules from these itemsets.

Example from itemset RPO:
Possible rules:
RP → O
RO → P
PO → R
O → RP
P → RO
R → PO

For n items, the number of rules possible is:

2n − 2

Advantages of Apriori Algorithm

1.High Scalability

Works well with large datasets.

2.Extensions Available

Many improved versions exist for different applications.

3.Easy to Understand

Simple logic and easy implementation.

4.Works with Unlabeled Data

Useful when data is not categorized.

Disadvantages of Apriori Algorithm

1.High Computational Cost

Requires scanning the entire database multiple times.

2.Large Number of Candidate Item sets

Can generate many possible combinations, increasing processing time.
Our website uses cookies to enhance your experience. Learn More
Accept !