This blog continues our discussion on ML/AI capabilities in Power BI. In our previous blog, we talked about the Automated Machine Learning in Power BI, a nice feature for business analysts and citizen data scientists. However, it is only available in Power BI Premium and Embedded capacities; thus, prevents some users from integrating data science solution in their advanced analytics process.
In this blog, we will introduce PyCaret, an open-source low-code machine learning library which can be integrated into Power BI. It enables customers to build machine learning models with a few lines of codes in Power BI.
Before delving into ML capabilities of PyCaret, you will first need to install PyCaret in your local machine or virtual environment and set your Python directory in Power BI.
To install PyCaret follows the instruction in this link.
In Power BI Desktop, select File > Options and Settings > Options > Python scripting to set your Python directory.
The data that we will use in this model is Bank Customers data used for churn modeling from Kaggle, an online data science community. You can download the data from here. Keep in mind that data that we use for modeling had already been preprocessed so it was slightly different from the original data. Particularly, I performed one-hot encoding to Geography column, removed the CustomerID, Surname, RowNumber column and split the data into Churn Training and Churn Testing data. For the sake of convenience, I include the link to the already prep data here.
Building Customer Churn Prediction Model
After loading data into Power BI, you can see the dataset Churn Training and Churn Testing datasets when you open Power Query Editor.
In your Churn Training dataset, select Transform > Run Python Script, you should see a Python editor enter Python scripts.
Our goal is to predict the features Exited in the data which indicates customer leaves the bank services (1) or not (0). This can be formulated as a classification problem. Let write the following lines of code in the Python editor and select OK to run the model.
The script creates the popular XGBoost model to predict the target label Exited. We then save the model as pickle file in directory C:/Users/VisualBI/Desktop/ so we can use it the future.
Keep in mind that PyCaret will handle most of the feature engineering and splitting data steps for you which can be useful for business users who do not have knowledge in programming.
Then, we go back to our Churn Testing data, select Transform> Run Python Scripts and run the following lines.
The scripts find the model which we previously trained and apply to the Churn Testing dataset. The result will be saved in the Label column. You can on the left side is original label column Exited and the predicted label Label.
PyCaret enables business users and citizen data scientists to discover a deeper layer of advanced analytics. It is open-sourced and easy to use. It provides a wide range of functions within Power BI. It is an exciting machine learning library which can be helpful to business users with little programming and statistics knowledge
Stay tuned for more blogs from me about Machine Learning and AI. Let us know if you are seeking additional guidance in planning your Power BI governance program. Read more blogs from Power BI Category here.