Market Basket Analysis using Apriori Algorithm

4 min readJan 27, 2021

Market Basket Analysis is a type of frequent itemset mining which analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. The discovery of these associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers.

With Market Basket Analysis, the buying patterns of the customers are represented using“Association Rules”. The interestingness of a rule is measured using two metrics viz. supportand confidence.

Example:

milk => bread [support = 2%, confidence = 60%]

A supportof 2% for the above rule states that 2% of all the transaction under analysis show that milk and bread are purchased together.

support(A => B) = P(A U B)

A confidence of 60% means that 60% of the customers who purchased milk also bought the bread.

confidence(A => B) = P(B|A) = support(A U B) / support(A)

Typically association rules are considered interesting if they satisfy both a minimum support thresholdand a minimum confidence threshold.

The following code demonstrates Market Basket Analysis with Apriori Algorithm using the vertical data format:

dataset = [ [100, [1,2,5]] , [200, [2,4]] , [300, [2,3]] , [400, [1,2,4]] , [500, [1,3]], [600, [2,3]] , [700, [1,3]] , [800, [1,2,3,5]] , [900, [1,2,3]] ]items = ['','Tshirt','Trousers','Belt','Jacket','Gloves','Sneakers'] 
#indexing starts from 1, hence first item is emptyn = len(dataset)
no_of_items = 5
min_supp = 2

In the above code snippet, dataset is a list whose first element is the transaction idand the second element is a list of item ids of items bought together. items is a list of the names of the items. The no_of_items stores the total number of items in the dataset. The min_supp stores the minimum support threshold.

def verticalFormat(dataset, no_of_items):
    
    n = len(dataset)
    data = []for i in range(1 , no_of_items+1):
        
        record = [set([i]), ] #Stores ith itemset
        
        tids = set() #Stores transaction ids for ith itemset
        
        for j in range(n):
            
            if i in dataset[j][1] :
                    #Append the transaction id
                    tids.add(dataset[j][0])
 
        #Append the transaction id list for ith itemset         
        record.append(tids)#Append ith record in the dataset
        data.append(record)
        
    return [data]L = verticalFormat(dataset, no_of_items)
L

[[[{1}, {100, 400, 500, 700, 800, 900}],
[{2}, {100, 200, 300, 400, 600, 800, 900}],
[{3}, {300, 500, 600, 700, 800, 900}],
[{4}, {200, 400}],
[{5}, {100, 800}]]]

L stores a list of frequent k-itemsets. L[1] denotes a list of frequent 2-itemsets. L[2] denotes a list of frequent 3-itemsets and so on.

L[k] contains tuples where the first element of the tuple is the itemset and the second element of the tuple is a set of transaction ids that contain the corresponding itemset. Thus L[k][0] is the first tuple and L[k][0][0] is the itemset in the first tuple and L[k][0][1] is the set of transactions in the first tuple.

def frequentItemsetsTable(L, min_supp):
  
    k=0
    
    while True:
        
        table = []
        n = len(L[k])
        
        for i in range(n):
            for j in range(i+1,n):#Union of itemsets from ith and jth records
                itemset = L[k][i][0].union(L[k][j][0]) 
                
                #Intersection of TID's from ith and jth records                   
                TIDset = L[k][i][1].intersection(L[k][j][1]) 
                
                #If min_supp count is satisfied, then only insert   
                the record in the table                
                if len(TIDset) >= min_supp:  
                    
                    record = []
        
                    record.append(itemset)
                    record.append(TIDset)
                    
                    if record not in table:
                        table.append(record)if len(table)==0:
            break
            
        if len(table)!=0 :
        
            L.append(table)
            k+=1
              
    return LfrequentItemsetsTable(L, 2)

[[[{1}, {100, 400, 500, 700, 800, 900}],
[{2}, {100, 200, 300, 400, 600, 800, 900}],
[{3}, {300, 500, 600, 700, 800, 900}],
[{4}, {200, 400}],
[{5}, {100, 800}]],
[[{1, 2}, {100, 400, 800, 900}],
[{1, 3}, {500, 700, 800, 900}],
[{1, 5}, {100, 800}],
[{2, 3}, {300, 600, 800, 900}],
[{2, 4}, {200, 400}],
[{2, 5}, {100, 800}]],
[[{1, 2, 3}, {800, 900}], [{1, 2, 5}, {100, 800}]]]

last = L[len(L)-1]
last

[[{1, 2, 3}, {800, 900}], [{1, 2, 5}, {100, 800}]]

def printRules(left, right, confidence):
    
    print("(", end="")
    
    for x in left:
        print(items[x], end=",")
                        
    print(")", end="  ")   
    
    print("-->", end="  ")
                    
    print("(", end="")
    
    for x in right:
        print(items[x], end=",")
        
    print(")", end="\t")   
    
    print("\t", "Confidence = ",confidence)
    print()from itertools import combinationsdef associationRules(last, conf):
    
    #For last itemset = (1,2,3)
    # left     right
    # 1         2,3   => items 2,3 are bought alongwith 1
    # 2         1,3
    # 3         1,2
    # 1,2       3
    # 1,3       2
    # 2,3       1
    #The last pair (2,3) has 2 elements
    #Hence the above table goes on till nCn-1 i.e 3C2 in this case
    
    for record in last:
        itemset = record[0]
        transactions = record[1]
        
        for i in range(1, len(itemset)):
            comb = list(combinations(itemset,i))
            
            for c in comb:
                
                left = set(c)
                right = itemset.difference(left)
   
                suppAUB = len(transactions)
                
                k = len(left)-1
                
                for r in L[k]:  #r[0] is itemset and r[1] is TIDset
                    if r[0]==left:
                        suppA = len(r[1])
                        
                confidence = (suppAUB/suppA)*100
                
                if confidence>= conf:
                    printRules(left, right, confidence)associationRules(last,0)

(T-shirt,) → (Trousers,Belt,) Confidence = 33.33333333333333
(Trousers,) → (T-shirt,Belt,) Confidence = 28.57142857142857
(Belt,) → (T-shirt,Trousers,) Confidence = 33.33333333333333
(T-shirt,Trousers,) → (Belt,) Confidence = 50.0
(T-shirt,Belt,) → (Trousers,) Confidence = 50.0
(Trousers,Belt,) → (T-shirt,) Confidence = 50.0
(T-shirt,) → (Trousers,Gloves,) Confidence = 33.33333333333333
(Trousers,) → (T-shirt,Gloves,) Confidence = 28.57142857142857
(Gloves,) → (T-shirt,Trousers,) Confidence = 100.0
(T-shirt,Trousers,) → (Gloves,) Confidence = 50.0
(T-shirt,Gloves,) → (Trousers,) Confidence = 100.0
(Trousers,Gloves,) → (T-shirt,) Confidence = 100.0

Market Basket Analysis using Apriori Algorithm

Written by Aaron R