Blogs / Tableau / Machine Learning & Data Analytics using TabPy- Explained with a use case

Machine Learning & Data Analytics using TabPy- Explained with a use case

Machine learning and data science concepts have revolutionized the world of BI and data analytics. Organizations across the globe want to leverage the capabilities of ML to enhance their traditional BI reporting and make better business decisions. Tableau, a front-runner in the BI space, came up with TabPy – a powerful analytics extension that enables building and using ML models inside traditional Tableau dashboards. This blog describes how ML using TabPy is applied to customer churn use case. Before learning about the ML functionalities using TabPy in detail, let us understand the use case first.

Use Case for Machine Learning & Data Analytics

As customers are the most valued asset of an organization, customer retention is crucial for any organization. Customer churn analytics enables organizations to identify and analyze the factors influencing customer churn. This gives us the best use case to exhibit TabPy functionality for Machine Learning and Data Analytics.

For our use case, we have used the Telco Customer Churn dataset. This dataset is from IBM Sample Datasets. The data contains information about a fictional telco company that provided home phone and internet services to California customers in Q3 2017. The dataset indicates which customers have left, stayed, or signed up for their service. To enable TabPy capabilities in Tableau, ensure that Tableau is integrated with Python. With this dataset, our primary analysis would be predicting if a customer would churn or not based on the attributes. 

Overview of churn analytics dashboard 

In this dashboard, to begin with, we have examined the total number of customers, followed by the number of customers who have churned out. Then, we have taken a deeper look at our target variable, which is churn.

Churn distribution in categorical features 

In the first chart, we have analyzed the percentage of churn distribution across all categorical attributes. 

Percentage of non-churn and churn customers across attributes
Figure 1: Percentage of non-churn and churn customers across attributes

This chart shows the percentage of non-churn and churn customers across gender (based on the option selected in the ‘View by’ menu). We can see that female customer tend to churn slightly more than male customers. To view the percentage of non-churn and churn customers across different attributes, the required attribute can be selected in the ‘View by’ menu.

Churn distribution across numerical features 

After evaluating the target variable across categorical features, we have explored churn distribution across numerical features like Tenure and Monthly Charges. The visual can be further filtered using filters on the right side of the visualization. 

Churn distribution in Numerical attributes
Figure 2: Churn distribution in Numerical attributes

We can see that churn increases with increasing Monthly charges (highest in 70-95 range). Also, customers with a short tenure (new customers) who incur higher monthly charges tend to churn more.

Features impacting churn  

Following the churn analysis across categorical and numerical attributes, we have further examined the features which highly impact the target variable. We have calculated ‘Feature Importance Score’ to determine the top five features most helpful in predicting churn in our use case. Though there are many ways to obtain the feature importance scores for our dataset, the XGBoost Machine Learning library has been used in this instance.

Top 5 attributes impacting churn
Figure 3: Top 5 attributes impacting churn

Let us look at the model used to determine the feature importance score:

Python code used to determine feature importance
Fig 3.1: Python code used to determine feature importance

The higher the ‘Feature Importance Score’ an attribute has, the more contribution it makes towards churn. From this chart, we can see the top 5 features helpful in predicting churn. ‘Contract’ is the most important feature, which is key in understanding churn patterns. Figure 3.1 shows us the python code used to arrive at the feature importance score using XGB Classifier.

From the feature importance chart, we can clearly understand that it is essential to investigate the ‘Contract’ feature to learn more about churn.

Services availed by customers across contract types 

In the following charts, we have examined all the services availed by customers across all the Contract types. 

Charts analyzing ’Contract type’ feature by services purchased
Figure 4: Charts analyzing ’Contract type’ feature by services purchased

Analysis of Internet services availed by customers of different contract types 

In the left-most viz in Figure 4, we have examined the number of customers (y-axis) availing different types of Internet Service (x-axis) across different contract types (different colors). Yellow represents one-year contract customers, blue represents two-year contract customers, purple represents month-to-month contract customers, and green represents the total no. of customers. We can see that more customers have opted for fiber optic service, and most of them are on a month-to-month contract.

Analysis of miscellaneous services availed by customers of different contract types 

In the center viz in Figure 4, we have surveyed the number of customers in five services- ‘Device Protection,’ ‘Online Backup,’ ‘Online Security,’ ‘Streaming Movies,’ and ‘Streaming TV’ across contract types (different colors). Since we have three contract types, the image is shown as a three-dimensional radar chart.

Analysis of phone line services availed by customers of different contract types 

In the right viz in Figure 4, we have taken a look at the number of customers availing a single phone line, more than one phone line, and customers who do not avail phone line service across different contract types (different colors). We can see that most customers have availed a single phone line service, and most of them are on a month-to-month contract.

Classifying customer profile based on the contract type 

So far, all the services availed by customers have been analyzed by our most important feature- contract type. For further clarity, we will now look at the customer profile classified by contract type.

Customer profile classified by contract types
Figure 5: Customer profile classified by contract types

In this viz, we can infer the ‘Total monthly charges,’ ‘Average monthly charges’ and ‘Average tenure of customers’ across Contract type. We can see that customers in one year contract incur higher Average monthly charges, and customers in a two-year contract have better ‘Average tenure.’

Correlation check between quantitative fields 

All our vizzes till now showcased specific trends of our target field (churn) and the trends of the most impactful field (Contract type). Our following viz shows a correlation between quantitative fields. Correlation is the general measure used to understand the linear relationship between quantitative fields.

Correlation between all quantitative fields
Figure 6: Correlation between all quantitative fields

Correlation for all quantitative fields like ‘Monthly charges,’ ‘Total charges’ and ‘Tenure’ is shown in the chart. Correlation patterns can be checked after changing values in the filters to see the impact that one quantitative field has on another quantitative field.

Machine learning prediction of churn and non-churn customers 

Since we have a fair idea of the correlation between quantitative fields, we can proceed to our end-to-end machine learning prediction to determine if a customer will churn or not based on the most important fields impacting churn.

Churn Prediction attribute that predicts if a customer will churn or not
Figure 8: Churn Prediction attribute that predicts if a customer will churn or not

In this viz, we have ‘Customer ID’ and the top four fields (‘Contract type,’ ‘TechSupport,’ ‘Online Security,’ ‘Tenure’), which gives us the predictions (non-churn customer or churn customer). Since we have used machine learning functionalities, the accuracy of prediction may differ from one model to another.

To predict churn and non-churn customers, machine learning PyCaret package is used along with the Naïve Bayes Classification model. The machine learning model is built and saved as a pickle file to be used in TabPy code. The pickle file is integrated into a TabPy calculation field for analytical functionalities.

The code used in the calculated field is given below: 

Entire code used in the calculated field.
Figure 8.3-Entire code used in the calculated field.

In this blog, we have made a detailed study of our use case- customer churn analytics. We have also looked at how TabPy has enabled us to use ML models to calculate feature importance scores, predict customer churn and visualize them in Tableau. We hope this blog helped you understand some real-life applications of TabPy and the value it can add to your analysis and business decisions.

To learn more about TabPy scripting, you can refer to our blog series:
1. Connecting Python with Tableau

2. Exploring the basics of TabPy Coding

3. Exploring Advanced Analytics using TabPy – Sentimental Analysis

Interested in implementing Machine learning functionalities using TabPy in your Tableau dashboard? To learn more about Visual BI’s Tableau Consulting and End User Training Programs, contact us here.  

Check out our other blogs on Tableau here


Corporate HQ:
5920 Windhaven Pkwy, Plano, TX 75093

+1 888-227-2794

+1 972-232-2233

+1 888-227-7192

solutions@visualbi.com


Copyright © Visual BI Solutions Inc.

Subscribe To Our Newsletter

Subscribe To Our Newsletter

Join our mailing list to receive the latest news and updates from our team.

You have Successfully Subscribed!

Share This!

Share this with your friends and colleagues!