Select Page

*This post is in continuation of data science algorithm reviews.

Association Rule Mining (ARM) is one of the most frequently used pattern mining algorithms. It might not sound familiar at first, but it is all around us. When you shop online at Amazon or eBay for example, based on your search words or browse histories, websites recommend similar products that you might be interested in or a bundle option that people frequently purchase together. These recommendation systems are built by applying ARM. The focus in ARM is to find hidden correlations among items and use those correlations to increase sales. The story about how Target found out a teen girl’s pregnancy before her father did is a well-known example of ARM. Having an ARM system can help your business in various ways:

• Ability to predict customer’s future behavior by analyzing the past purchasing trends
• Marketing plans (advertising, personalized coupons, and cross selling strategies)
• Improves ordering and location of products
• Improves supply chain and distribution of products

To implement Association Rule Mining, all you need is transaction-based data which has a unique transaction ID and a list of items purchased in each transaction (Fig. 1). After keeping records of transaction data for a certain amount of time, you can start building your own recommendation system.

Figure: 1 A Walmart grocery receipt showing transaction id and list of items.

The first step is discovering item sets that are frequently purchased together. This step usually takes a long time as it requires scanning the entire historical transaction data ‘n’ time, where n is the number of items. The second step starts by generating all possible rules from each frequent item set you found in step 1. Then, filter out rules based on support (% occurrence) and confidence (probability of this rule will occur in the future) measures. These values are subjective to the dataset size.

Here is a snapshot of sample association rules for a grocery store (Fig. 2). ARM library in R is used in this post but this algorithm is also available in SAP HANA Predictive Analysis Library(PAL). For instance, the third rule, {flour, baking powder} => {sugar}, can be interpreted as follows:

Figure: 2 Association rules based on grocery transactions

• When people bought flour & baking powder, sugar was also purchased in 1% (support value) of all transactions.
• (When people buy) {flour, baking powder} => (there is a 60% chance that they will buy) {sugar}
• The support value seems low in this example but it is common to have a low support value with a large data set (*For advanced Association Rule Mining models, you can find an optimal ‘minimum support value’ using a mathematical function or using multiple minimum supports. Details can be found here!)

So far, how Association Rule Mining can be developed has been reviewed. Now, let’s look at some of the most common visualization tools for ARM. You can decide how many rules you want to plot for a better visualization (Top 10 rules are plotted in Fig. 3). The most common visualization manner is using a graph. In a graph plot, items or item sets are shown in vertices and edges indicate the relationship between them. You can also create parallel coordinates plot where the x-axis represents the positions in a rule and ‘arrow points’ indicate the consequent item that will be purchased.

Figure: 3 Association rules in graph and matrix plots

Additionally, if you want to incorporate a ‘unit profit’ and the’ quantity of items sold’ to ARM, a weighted ARM can be considered. It is an advanced ARM, which calculates a weight factor for each item by dividing total profits made from that item by the number of units sold.  Having a weighted ARM can be helpful in discovering rules that did not occur frequently (low support) but, it was purchased in bulk and generated a high profit margin.
In the next series, we will discuss ensemble learning method. Stay tuned!

Got questions? Reach out to us.