Market Basket Analysis using Apriori Algorithm

Aaron R
4 min readJan 27, 2021

Market Basket Analysis is a type of frequent itemset mining which analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. The discovery of these associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers.

With Market Basket Analysis, the buying patterns of the customers are represented using“Association Rules”. The interestingness of a rule is measured using two metrics viz. supportand confidence.

Example:

milk => bread [support = 2%, confidence = 60%]

A supportof 2% for the above rule states that 2% of all the transaction under analysis show that milk and bread are purchased together.

support(A => B) = P(A U B)

A confidence of 60% means that 60% of the customers who purchased milk also bought the bread.

confidence(A => B) = P(B|A) = support(A U B) / support(A)

Typically association rules are considered interesting if they satisfy both a minimum support thresholdand a minimum confidence threshold.

The following code demonstrates Market Basket Analysis with Apriori Algorithm using the vertical data format:

dataset = [ [100, [1,2,5]] , [200, [2,4]] , [300, [2,3]] , [400, [1,2,4]] , [500, [1,3]], [600, [2,3]] , [700, [1,3]] , [800, [1,2,3,5]] , [900, [1,2,3]] ]items = ['','Tshirt','Trousers','Belt','Jacket','Gloves','Sneakers'] 
#indexing starts from 1, hence first item is emptyn = len(dataset)
no_of_items = 5
min_supp = 2

In the above code snippet, dataset is a list whose first element is the transaction idand the second element is a list of item ids of items bought together. items is a list of the names of the items. The no_of_items stores the total number of items in the dataset. The min_supp stores the minimum support threshold.

def verticalFormat(dataset, no_of_items):

n = len(dataset)
data = []for i in range(1 , no_of_items+1):

record = [set([i]), ] #Stores ith itemset

tids = set() #Stores transaction ids for ith itemset

for j in range(n):

if i in dataset[j][1] :
#Append the transaction id
tids.add(dataset[j][0])

#Append the transaction id list for ith itemset
record.append(tids)#Append ith record in the dataset
data.append(record)

return [data]L = verticalFormat(dataset, no_of_items)
L

[[[{1}, {100, 400, 500, 700, 800, 900}],
[{2}, {100, 200, 300, 400, 600, 800, 900}],
[{3}, {300, 500, 600, 700, 800, 900}],
[{4}, {200, 400}],
[{5}, {100, 800}]]]

L stores a list of frequent k-itemsets. L[1] denotes a list of frequent 2-itemsets. L[2] denotes a list of frequent 3-itemsets and so on.

L[k] contains tuples where the first element of the tuple is the itemset and the second element of the tuple is a set of transaction ids that contain the corresponding itemset. Thus L[k][0] is the first tuple and L[k][0][0] is the itemset in the first tuple and L[k][0][1] is the set of transactions in the first tuple.

def frequentItemsetsTable(L, min_supp):

k=0

while True:

table = []
n = len(L[k])

for i in range(n):
for j in range(i+1,n):#Union of itemsets from ith and jth records
itemset = L[k][i][0].union(L[k][j][0])

#Intersection of TID's from ith and jth records
TIDset = L[k][i][1].intersection(L[k][j][1])

#If min_supp count is satisfied, then only insert
the record in the table

if len(TIDset) >= min_supp:

record = []

record.append(itemset)
record.append(TIDset)

if record not in table:
table.append(record)if len(table)==0:
break

if len(table)!=0 :

L.append(table)
k+=1

return LfrequentItemsetsTable(L, 2)

[[[{1}, {100, 400, 500, 700, 800, 900}],
[{2}, {100, 200, 300, 400, 600, 800, 900}],
[{3}, {300, 500, 600, 700, 800, 900}],
[{4}, {200, 400}],
[{5}, {100, 800}]],
[[{1, 2}, {100, 400, 800, 900}],
[{1, 3}, {500, 700, 800, 900}],
[{1, 5}, {100, 800}],
[{2, 3}, {300, 600, 800, 900}],
[{2, 4}, {200, 400}],
[{2, 5}, {100, 800}]],
[[{1, 2, 3}, {800, 900}], [{1, 2, 5}, {100, 800}]]]

last = L[len(L)-1]
last

[[{1, 2, 3}, {800, 900}], [{1, 2, 5}, {100, 800}]]

def printRules(left, right, confidence):

print("(", end="")

for x in left:
print(items[x], end=",")

print(")", end=" ")

print("-->", end=" ")

print("(", end="")

for x in right:
print(items[x], end=",")

print(")", end="\t")

print("\t", "Confidence = ",confidence)
print()from itertools import combinationsdef associationRules(last, conf):

#For last itemset = (1,2,3)
# left right
# 1 2,3 => items 2,3 are bought alongwith 1
# 2 1,3
# 3 1,2
# 1,2 3
# 1,3 2
# 2,3 1
#The last pair (2,3) has 2 elements
#Hence the above table goes on till nCn-1 i.e 3C2 in this case

for record in last:
itemset = record[0]
transactions = record[1]

for i in range(1, len(itemset)):
comb = list(combinations(itemset,i))

for c in comb:

left = set(c)
right = itemset.difference(left)

suppAUB = len(transactions)

k = len(left)-1

for r in L[k]: #r[0] is itemset and r[1] is TIDset
if r[0]==left:
suppA = len(r[1])

confidence = (suppAUB/suppA)*100

if confidence>= conf:
printRules(left, right, confidence)associationRules(last,0)

(T-shirt,) → (Trousers,Belt,) Confidence = 33.33333333333333

(Trousers,) → (T-shirt,Belt,) Confidence = 28.57142857142857

(Belt,) → (T-shirt,Trousers,) Confidence = 33.33333333333333

(T-shirt,Trousers,) → (Belt,) Confidence = 50.0

(T-shirt,Belt,) → (Trousers,) Confidence = 50.0

(Trousers,Belt,) → (T-shirt,) Confidence = 50.0

(T-shirt,) → (Trousers,Gloves,) Confidence = 33.33333333333333

(Trousers,) → (T-shirt,Gloves,) Confidence = 28.57142857142857

(Gloves,) → (T-shirt,Trousers,) Confidence = 100.0

(T-shirt,Trousers,) → (Gloves,) Confidence = 50.0

(T-shirt,Gloves,) → (Trousers,) Confidence = 100.0

(Trousers,Gloves,) → (T-shirt,) Confidence = 100.0

--

--