Machine learning and data science concepts have revolutionized the world of BI and data analytics. Organizations across the globe want to leverage the capabilities of ML to enhance their traditional BI reporting and make better business decisions. Tableau, a front-runner in the BI space, came up with TabPy – a powerful analytics extension that enables building and using ML models inside traditional Tableau dashboards. This blog describes how ML using TabPy is applied to customer churn use case. Before learning about the ML functionalities using TabPy in detail, let us understand the use case first.
Use Case for Machine Learning & Data Analytics
As customers are the most valued asset of an organization, customer retention is crucial for any organization. Customer churn analytics enables organizations to identify and analyze the factors influencing customer churn. This gives us the best use case to exhibit TabPy functionality for Machine Learning and Data Analytics.
For our use case, we have used the Telco Customer Churn dataset. This dataset is from IBM Sample Datasets. The data contains information about a fictional telco company that provided home phone and internet services to California customers in Q3 2017. The dataset indicates which customers have left, stayed, or signed up for their service. To enable TabPy capabilities in Tableau, ensure that Tableau is integrated with Python. With this dataset, our primary analysis would be predicting if a customer would churn or not based on the attributes.
Overview of churn analytics dashboard
In this dashboard, to begin with, we have examined the total number of customers, followed by the number of customers who have churned out. Then, we have taken a deeper look at our target variable, which is churn.
Churn distribution in categorical features
In the first chart, we have analyzed the percentage of churn distribution across all categorical attributes.
This chart shows the percentage of non-churn and churn customers across gender (based on the option selected in the ‘View by’ menu). We can see that female customer tend to churn slightly more than male customers. To view the percentage of non-churn and churn customers across different attributes, the required attribute can be selected in the ‘View by’ menu.
Churn distribution across numerical features
After evaluating the target variable across categorical features, we have explored churn distribution across numerical features like Tenure and Monthly Charges. The visual can be further filtered using filters on the right side of the visualization.
We can see that churn increases with increasing Monthly charges (highest in 70-95 range). Also, customers with a short tenure (new customers) who incur higher monthly charges tend to churn more.
Features impacting churn
Following the churn analysis across categorical and numerical attributes, we have further examined the features which highly impact the target variable. We have calculated ‘Feature Importance Score’ to determine the top five features most helpful in predicting churn in our use case. Though there are many ways to obtain the feature importance scores for our dataset, the XGBoost Machine Learning library has been used in this instance.
Let us look at the model used to determine the feature importance score:
The higher the ‘Feature Importance Score’ an attribute has, the more contribution it makes towards churn. From this chart, we can see the top 5 features helpful in predicting churn. ‘Contract’ is the most important feature, which is key in understanding churn patterns. Figure 3.1 shows us the python code used to arrive at the feature importance score using XGB Classifier.
From the feature importance chart, we can clearly understand that it is essential to investigate the ‘Contract’ feature to learn more about churn.
Services availed by customers across contract types
In the following charts, we have examined all the services availed by customers across all the Contract types.
Analysis of Internet services availed by customers of different contract types
In the left-most viz in Figure 4, we have examined the number of customers (y-axis) availing different types of Internet Service (x-axis) across different contract types (different colors). Yellow represents one-year contract customers, blue represents two-year contract customers, purple represents month-to-month contract customers, and green represents the total no. of customers. We can see that more customers have opted for fiber optic service, and most of them are on a month-to-month contract.
Analysis of miscellaneous services availed by customers of different contract types
In the center viz in Figure 4, we have surveyed the number of customers in five services- ‘Device Protection,’ ‘Online Backup,’ ‘Online Security,’ ‘Streaming Movies,’ and ‘Streaming TV’ across contract types (different colors). Since we have three contract types, the image is shown as a three-dimensional radar chart.
Analysis of phone line services availed by customers of different contract types
In the right viz in Figure 4, we have taken a look at the number of customers availing a single phone line, more than one phone line, and customers who do not avail phone line service across different contract types (different colors). We can see that most customers have availed a single phone line service, and most of them are on a month-to-month contract.
Classifying customer profile based on the contract type
So far, all the services availed by customers have been analyzed by our most important feature- contract type. For further clarity, we will now look at the customer profile classified by contract type.
In this viz, we can infer the ‘Total monthly charges,’ ‘Average monthly charges’ and ‘Average tenure of customers’ across Contract type. We can see that customers in one year contract incur higher Average monthly charges, and customers in a two-year contract have better ‘Average tenure.’
Correlation check between quantitative fields
All our vizzes till now showcased specific trends of our target field (churn) and the trends of the most impactful field (Contract type). Our following viz shows a correlation between quantitative fields. Correlation is the general measure used to understand the linear relationship between quantitative fields.
Correlation for all quantitative fields like ‘Monthly charges,’ ‘Total charges’ and ‘Tenure’ is shown in the chart. Correlation patterns can be checked after changing values in the filters to see the impact that one quantitative field has on another quantitative field.
Machine learning prediction of churn and non-churn customers
Since we have a fair idea of the correlation between quantitative fields, we can proceed to our end-to-end machine learning prediction to determine if a customer will churn or not based on the most important fields impacting churn.
In this viz, we have ‘Customer ID’ and the top four fields (‘Contract type,’ ‘TechSupport,’ ‘Online Security,’ ‘Tenure’), which gives us the predictions (non-churn customer or churn customer). Since we have used machine learning functionalities, the accuracy of prediction may differ from one model to another.
To predict churn and non-churn customers, machine learning PyCaret package is used along with the Naïve Bayes Classification model. The machine learning model is built and saved as a pickle file to be used in TabPy code. The pickle file is integrated into a TabPy calculation field for analytical functionalities.
The code used in the calculated field is given below:
In this blog, we have made a detailed study of our use case- customer churn analytics. We have also looked at how TabPy has enabled us to use ML models to calculate feature importance scores, predict customer churn and visualize them in Tableau. We hope this blog helped you understand some real-life applications of TabPy and the value it can add to your analysis and business decisions.
To learn more about TabPy scripting, you can refer to our blog series:
1. Connecting Python with Tableau
2. Exploring the basics of TabPy Coding
3. Exploring Advanced Analytics using TabPy – Sentimental Analysis
Interested in implementing Machine learning functionalities using TabPy in your Tableau dashboard? To learn more about Visual BI’s Tableau Consulting and End User Training Programs, contact us here.
Check out our other blogs on Tableau here.