What are market basket analysis and the apriori algorithm?

Yannawut Kimnaruk
8 min readSep 18, 2022

Imagine that you are a shop owner. These are the questions that you may want to answer:

  • “When arranging the shelves in the store, what should be placed close to what?”
  • “ For the online shopping application, what products will be recommended for our customers?”
  • “ Which products should be recommended together in the promotion?”

An association rule for market basket analysis should be performed to extract insight from large transaction data to answer these questions. The association rule can identify the relationship between items using machine learning (or statistics) with items in the basket.

In this article, I will cover simple metrics in the association rule and the apriori algorithm that will reduce the computational workload for the association rule. It is important to understand the concept before coding since coding is simple but translating the results is hard.

Example

There are 5 items in our shop including bread, butter, jam, milk, and egg. There are 5 transactions (baskets) in the database.

Transaction example

The computer will translate to a matrix like this.

Transaction as a matrix

From the transaction dataset, we will calculate the metrics that can guide us to business actions. There are 5 famous metrics covered in this article: support, confidence, lift, leverage, and conviction.

Metrics

1. Support

How many transaction contain this itemset?

Support gives an idea of how frequent an itemset is in all the transactions.

The support value helps us identify the rules worth considering for further analysis.

freq(X) is the number of occurrences of X in all transactions.

N is the total number of transactions.

Ex. Support of bread = 4/5 = 0.8

These tables show the support of all itemset with size 1–2. You can try calculating along.

support of all itemset with size 1–2

High support means that most customers buy this item, so this item is important for our shop.

We can filter out itemsets with low support since the number of occurrences is too low to find any insight (rule).

2. Confidence

Considering only transaction containing item A. How many transaction contain item B?

Confidence is a conditional probability (Bayesian statistics). It defines the probability of the occurrence of the following item(s) in the same transaction given some item(s) (antecedents) are already in that transaction.

Ex.
Confidence(Bread →Milk) = 2/4 = 0.5
Confidence(Milk →Jam) = 1/3 = 0.33

The below tables show the confidence of some rules.

confidence of some rules

High confidence means that the basket containing A will likely contain B also.

However, considering only the value of confidence is not enough to make any business decision. If B (following itemset) is a very frequent itemset, confidence related to B will always be high. Lift is another metric to be considered together with confidence.

3. Lift

With and without item A is in the transaction, mow much it affect item B?

Lift is the ratio of the probability of B occurrence given A is present and the probability of B occurrence without knowing about A.

Ex.
Lift(Bread →Milk) = 0.5/0.4 = 1.25
Lift(Milk →Jam) = 0.33/0.2 = 1.65

The below tables show the lift of some rules.

lift of some rules

If Lift is higher than 1, the presence of A in this transaction causes a higher probability of B in the same transaction.

There are 2 matrics which may not be as famous as the previous one but they can provide other perspectives, so it is good to know them.

4. Leverage

With and without item A is in the transaction, mow much it affect item B?

Leverage computes the probability of A and B occurring together and the frequency that would be expected if A and B were independent.

Ex.
Leverage(Bread →Milk) = 0.4-(0.8*0.6) = 0.08
Leverage(Milk →Jam) = 0.2-(0.6*0.2) = 0.08

Leverage is similar to lift but easier to interpret since it ranges from -1 to 1 while lift ranges from 0 to infinity.

A leverage value of 0 indicates independence.

5. Conviction

Conviction helps to judge if the rule happened to be there by chance or not.

Ex.
Conviction(Bread →Milk) = (1-0.4)/(1–0.5) = 1.2
Conviction(Milk →Jam) = (1-0.2)(1–0.33) = 1.19

A high conviction value means that the consequent is highly dependent on the antecedent (A). It can be interpreted as lift.

If items are independent, the conviction is 1.

Apriori algorithm

As an increment of the number of itemsets leads to an exponential growth of the association rule, it is not efficient to calculate all rules. If we have 10 items, the number of rules is 57,000!!! Then, what about a supermarket with more than 100 items? Calculating all rules can be computationally expensive.

The Apriori algorithm is designed to find only important association rules with minimum computation.

There are 2 steps in association rule mining

  1. Itemset Generation
  2. Rule Generation

We will see how the Apriori algorithm helps us in these 2 steps

1. Itemset Generation

We want only frequent itemsets.

Frequent itemsets’ supports are higher than minimum support. To make it simple, they are itemsets that occur frequently in all transactions.

The easiest way is to calculate the support of all possible itemsets. However, the number of itemsets is increasing exponentially with the number of items.

N = a number of items

For example, if we have 4 items, the number of itemsets is 2⁴ -1 = 15. if we have 10 items, the number of itemsets is 2¹⁰ -1 = 1023.

Therefore, it is computationally expensive and time-consuming to calculate support for all itemsets.

We can remove many itemsets from support calculation by following the Apriori principle.

“The supersets of itemset will have lower or equal support to that itemset”

It is intuitive because it is harder to find a transaction that has many required items.

Ex.

The store has 4 items including bread, butter, jam, and milk.

We can create 15 itemsets as illustrated below.

All itemsets
  • Assume we set acceptable minimum support = 0.3.
  • If the support of bread is less than 0.3, all supersets of bread will also have support of less than 0.3.
  • Then, we can exclude all supersets of bread from the support calculation. We call this “Support-Based Pruning”
All itemset with the Apriori algorithm

The computation can be greatly reduced by as much as half.

Make it more systematic by creating a flowchart below.

Apriori Itemset Generation Flowchart

F0 is a list of all items in the store.

After we have the frequent itemsets, we can now create an association rule.

2. Rule Generation

We want only rules with high condidence.

A way to identify important rules is to filter only rules with high confidence since it means that the correlation between items A and B is high.

However, it is also not efficient to calculate the confidence of all rules (the number of rules is even higher than the number of itemsets).

We can remove many rules from confidence calculation by following the Apriori principle.

“The confidence of rule from the same itemset will be lower when the number of consequence items is higher”

For example, if we have 4 items in the itemset: A, B, C, and D.
the confidence of (A,B,C→ D) ≥ (A,B → C,D) ≥ (A → B,C,D).

To prove this, I will recall the formula of confidence.

The difference between these 2 rules is only the denominator.
As we talk about support in step 1, more items will cause lower support (lower frequency), so the first rule has a lower denominator and higher confidence.

We can perform “Confidence-Based Pruning” as we do in step1. We will start with a rule that has only 1 consequence.

It may be confusing at first. My tip is to focus only on the consequence (item(s) behind an arrow) and do it as we used to do with support.

Ex.

The store has 4 items including bread, butter, jam, and milk.

We can create 14 rules as illustrated below.

All rules
  • Assume we set acceptable minimum confidence = 0.3.
  • If the confidence of {butter, jam, milk} → {bread} is less than 0.3, all rules that the consequence is a superset of bread will also have the confidence of less than 0.3.
  • Then, we can exclude all rules that the consequence is a superset of bread from the confidence calculation.
All rules with the Apriori algorithm

After we finish 1 itemset, we move to other frequent itemsets.

Finally, we will calculate the lift from the remaining rules and make business decisions with that data!

Conclusion

In this article, I cover metrics in market basket analysis including support, confidence, lift, leverage, and conviction. Then, I introduce the apriori algorithm that can decrease the computation of the association rule by removing some rules which do not pass the criteria.

Hope you understand the concept of apriori. It may seem difficult at first but it will be very simple when implemented with Python or R programming.

If you like this article, please follow me for more data science content.

--

--