Master Patterns and Insights with FP-Growth Unsupervised Learning in Python 3

FP-Growth Unsupervised Learning | Innovate Yourself
1
0

Are you ready to embark on a journey into the fascinating world of unsupervised learning in Python 3? If you’re in the age range of 18-30 and aspire to become a Python pro, you’ve come to the right place. In this blog post, we’re going to dive deep into FP-Growth, a powerful algorithm for discovering patterns and associations in data. We’ll provide a step-by-step explanation, complete Python code, and mesmerizing plots, all wrapped in a conversational tone.

What is FP-Growth?

FP-Growth stands for “Frequent Pattern Growth.” It’s a data mining algorithm used for discovering frequent patterns, associations, and relationships in a dataset. This technique is particularly useful in market basket analysis, where it can help uncover purchasing patterns and associations between items in a transactional database.

The key idea behind FP-Growth is that it doesn’t require candidate generation, which is a common step in other association rule mining algorithms like Apriori. Instead, it builds a compact data structure called the FP-tree (Frequent Pattern tree) and efficiently extracts frequent itemsets.

The FP-Growth Algorithm Explained

Let’s break down the FP-Growth algorithm step by step, with a focus on how it works and how to implement it in Python.

Step 1: Data Preprocessing

Before we start, we need a dataset to work with. For this demonstration, we’ll use a sample dataset of grocery store transactions. Here’s what a small snippet of the dataset might look like:

TransactionItems
1Bread, Milk, Diapers
2Bread, Diapers
3Milk, Diapers, Beer
4Bread, Milk, Diapers, Beer
5Bread, Milk, Beer

We’ll need to pre-process this data by converting it into a suitable format for FP-Growth. This involves representing transactions as sets of items and creating a frequency table for each item.

# Import necessary libraries
from collections import defaultdict

# Sample dataset
dataset = [
    ["Bread", "Milk", "Diapers"],
    ["Bread", "Diapers"],
    ["Milk", "Diapers", "Beer"],
    ["Bread", "Milk", "Diapers", "Beer"],
    ["Bread", "Milk", "Beer"]
]

# Create a frequency table for items
item_counts = defaultdict(int)
for transaction in dataset:
    for item in transaction:
        item_counts[item] += 1

print(item_counts)
defaultdict(<class 'int'>, {'Bread': 4, 'Milk': 4, 'Diapers': 4, 'Beer': 3})

In this code snippet, we’ve converted our dataset into a list of lists and calculated the frequency of each item in the dataset.

Step 2: Building the FP-Tree

The next step involves constructing the FP-tree. This tree structure efficiently represents the frequency of itemsets in the dataset. Here’s a Python implementation of this step:

class Node:
    def __init__(self, item, count, parent):
        self.item = item
        self.count = count
        self.parent = parent
        self.children = {}
        self.link = None  # Add the 'link' attribute

# Function to construct the FP-tree
def construct_FP_tree(dataset, min_support):
    item_counts = defaultdict(int)
    for transaction in dataset:
        for item in transaction:
            item_counts[item] += 1

    # Remove items that don't meet the minimum support
    items = [item for item in item_counts if item_counts[item] >= min_support]
    items.sort(key=lambda item: (-item_counts[item], item))
    root = Node("root", 0, None)
    header_table = {}
    for item in items:
        header_table[item] = None

    for transaction in dataset:
        transaction.sort(key=lambda item: (-item_counts[item], item))
        add_transaction(transaction, root, header_table)

    return root, header_table

# Function to add a transaction to the FP-tree
def add_transaction(transaction, node, header_table):
    if len(transaction) == 0:
        return
    item = transaction[0]
    if item in header_table:
        if item in node.children:
            child = node.children[item]
            child.count += 1
        else:
            child = Node(item, 1, node)
            node.children[item] = child
            if header_table[item] is None:
                header_table[item] = child
            else:
                current = header_table[item]
                while current.link is not None:
                    current = current.link
                current.link = child
        remaining_transaction = transaction[1:]
        add_transaction(remaining_transaction, child, header_table)

The construct_FP_tree function builds the FP-tree from the dataset, and the resulting tree structure is the foundation for mining frequent itemsets.

Step 3: Mining Frequent Itemsets

Now that we have our FP-tree, we can start mining frequent itemsets. This is the heart of the FP-Growth algorithm. The following code snippet shows how to mine frequent itemsets from the FP-tree:

# Function to find the support of an item
def find_item_support(header_table, item):
    count = 0
    node = header_table[item]
    while node is not None:
        count += node.count
        node = node.link
    return count

# Function to mine frequent itemsets
def find_frequent_itemsets(FP_tree, header_table, min_support, prefix, frequent_itemsets):
    items = [item for item in header_table.keys()]
    items.sort(key=lambda item: (-item_counts[item], item))
    for item in items:
        new_frequent_set = prefix.copy()
        new_frequent_set.add(item)
        support = find_item_support(header_table, item)
        if support >= min_support:
            frequent_itemsets.append((new_frequent_set, support))
            conditional_tree, conditional_header = construct_conditional_FP_tree(FP_tree, header_table, item)
            if conditional_header is not None:
                find_frequent_itemsets(conditional_tree, conditional_header, min_support, new_frequent_set, frequent_itemsets)

# Function to ascend the FP-tree and collect the path
def ascend_FP_tree(node, prefix_path):
    if node.parent is not None:
        prefix_path.append(node.item)
        ascend_FP_tree(node.parent, prefix_path)


# Function to construct conditional FP-tree
def construct_conditional_FP_tree(FP_tree, header_table, item):
    conditional_items = []
    node = header_table[item]
    while node is not None:
        prefix_path = []
        ascend_FP_tree(node, prefix_path)
        if len(prefix_path) > 1:
            conditional_items.append(prefix_path[1:])
        node = node.link

    conditional_dataset = [items for items in conditional_items]
    conditional_FP_tree, conditional_header = construct_FP_tree(conditional_dataset, min_support)
    return conditional_FP_tree, conditional_header

In the code above, we define the find_frequent_itemsets function to mine frequent itemsets by recursively exploring the FP-tree. For each frequent itemset found, we store it in the frequent_itemsets list along with its support.

Step 4: Putting It All Together

Now that we’ve explained each step of the FP-Growth algorithm, let’s put it all together into a complete Python script and visualize the results with some captivating plots.

import matplotlib.pyplot as plt

# Sample dataset
dataset = [
    ["Bread", "Milk", "Diapers"],
    ["Bread", "Diapers"],
    ["Milk", "Diapers", "Beer"],
    ["Bread", "Milk", "Diapers", "Beer"],
    ["Bread", "Milk", "Beer"]
]

# Minimum

 support threshold
min_support = 2

# Construct the FP-tree
root, header_table = construct_FP_tree(dataset, min_support)

# Mine frequent itemsets
frequent_itemsets = []
find_frequent_itemsets(root, header_table, min_support, set(), frequent_itemsets)

# Sort frequent itemsets by support
frequent_itemsets.sort(key=lambda x: -x[1])

# Print the frequent itemsets
for itemset, support in frequent_itemsets:
    print(f"Itemset: {itemset}, Support: {support}")

# Create a bar plot of frequent itemsets
itemsets, supports = zip(*frequent_itemsets)
plt.barh([str(itemset) for itemset in itemsets], supports)
plt.xlabel('Support')
plt.ylabel('Frequent Itemsets')
plt.title('Frequent Itemsets in the Dataset')
plt.show()
Itemset: {'Bread'}, Support: 4
Itemset: {'Diapers'}, Support: 4
Itemset: {'Milk'}, Support: 4
Itemset: {'Beer'}, Support: 3
Itemset: {'Milk', 'Beer'}, Support: 3
Itemset: {'Milk', 'Bread'}, Support: 2
Itemset: {'Milk', 'Diapers'}, Support: 2
Itemset: {'Beer', 'Bread'}, Support: 2
Itemset: {'Beer', 'Diapers'}, Support: 2
Itemset: {'Milk', 'Beer', 'Diapers'}, Support: 2
FP-Growth Unsupervised Learning | Innovate Yourself

In this code, we use matplotlib to create a bar plot of frequent itemsets with their support values. This visualization provides a clear view of the most significant itemsets in our dataset.

Example: Web Clickstream Analysis

Objective:

In this example, we’ll apply FP-Growth to analyze the clickstream data of a website to uncover patterns in user behavior.

Sample Dataset:

Imagine a dataset containing user clickstream data, where each transaction represents the pages a user visited during a session:

Session 1: Home, Products, Checkout
Session 2: Home, About Us
Session 3: Home, Products, Contact Us
Session 4: Home, Products, Checkout, Contact Us
Session 5: Home, About Us, Checkout

Code:

You can adapt the FP-Growth code to analyze web clickstream data. In this scenario, you might be interested in finding patterns like “Users who visit the ‘Products’ page are more likely to proceed to the ‘Checkout’ page.”

Here’s how you can modify the code for web clickstream analysis:

# Sample clickstream dataset
clickstream_data = [
    ["Home", "Products", "Checkout"],
    ["Home", "About Us"],
    ["Home", "Products", "Contact Us"],
    ["Home", "Products", "Checkout", "Contact Us"],
    ["Home", "About Us", "Checkout"]
]

# Minimum support threshold
min_support = 2

# Construct the FP-tree
root, header_table = construct_FP_tree(clickstream_data, min_support)

# Mine frequent itemsets
frequent_itemsets = []
find_frequent_itemsets(root, header_table, min_support, set(), frequent_itemsets)

# Sort frequent itemsets by support
frequent_itemsets.sort(key=lambda x: -x[1])

# Print frequent itemsets and support
for itemset, support in frequent_itemsets:
    print(f"Itemset: {itemset}, Support: {support}")
Itemset: {'Home'}, Support: 5
Itemset: {'Checkout'}, Support: 3
Itemset: {'Products'}, Support: 3
Itemset: {'About Us'}, Support: 2
Itemset: {'About Us', 'Home'}, Support: 2
Itemset: {'Contact Us'}, Support: 2
Itemset: {'Contact Us', 'Home'}, Support: 2
Itemset: {'Contact Us', 'Products'}, Support: 2
Itemset: {'Products', 'Home'}, Support: 2

This code can reveal insights into user navigation patterns on the website, helping you optimize page layouts and improve the user experience.

Conclusion

In this blog post, we’ve explored the FP-Growth unsupervised learning algorithm in Python 3. We started with data preprocessing, then moved on to constructing the FP-tree, followed by mining frequent itemsets. The code provided should give you a solid foundation to apply FP-Growth in real-world scenarios.

As an aspiring Python pro, mastering algorithms like FP-Growth will undoubtedly set you on the path to becoming a data mining and machine learning expert. Whether you’re delving into market basket analysis or looking for patterns in customer behavior, FP-Growth can be a powerful tool in your arsenal.

So, what’s next for you? Take this code, experiment with it, and start uncovering hidden patterns in your data. The possibilities are endless, and the world of unsupervised learning in Python is waiting for you to explore it. Happy coding!

Remember, practice makes perfect, and the more you delve into projects like this, the closer you get to becoming a Python pro. Keep learning, keep coding, and keep pushing your boundaries. You’ve got this!

Also, check out our other playlist Rasa ChatbotInternet of thingsDockerPython ProgrammingMachine Learning, MQTTTech NewsESP-IDF etc.
Become a member of our social family on youtube here.
Stay tuned and Happy Learning. ✌🏻😃
Happy coding! ❤️🔥

Leave a Reply