Data cleansing is a part of data wrangling by which the data is transformed in order to get meaningful analysis.

Uncleansed data may contain unwanted characters, null values, unformatted categorical values which may result in misguided decision making. Depending on the data pattern, replacing quantitative nulls with statistically derived values may avoid significant variation in the analysis.

Data Cleansing is a necessary preprocessing step in most areas including,

  1. Machine learning
  2. Data Prediction
  3. Data Analysis and Reporting

Data Cleansing can be performed in Alteryx using tools like Data Cleansing and Imputation.

Qualitative values can be sanitized with the help of Data Cleansing Tool and Quantitative values with Imputation Tool. These tools can be found under Preparation tab in Alteryx,

Data Cleansing and Imputation Tool in Alteryx

Data Cleansing tool can be used to format the Data in the following possible ways:

  1. Removing unwanted spaces
  2. For a String Datatype column, removing letters from a Numeric column like Zip code, Phone number or removing Numbers from a Text column like Country Code
  3. Removing Punctuations
  4. Converting the case of the letters (e.g. ALL CAPS for Key, Camel Case for Description)

Scenario

Below is the sample data where Country Column has different formats which have unwanted spaces, Numbers, punctuations, no consistent case.

Data Cleansing and Imputation Tool in Alteryx

Below enabled the setting for Country Column in Data Cleansing tool will format the values to make the data consistent.

Data Cleansing and Imputation Tool in Alteryx

Output

Data Cleansing and Imputation Tool in Alteryx

Scenario

In the above screenshot, the Product ID column is supposed to have only numbers. There are some unwanted characters that need to be cleansed.

Below settings for Product ID column in Data Cleansing tool will clean the data

Data Cleansing and Imputation Tool in Alteryx

Output

Data Cleansing and Imputation Tool in Alteryx

Imputation tool can be used to replace values in the following possible ways,

  1. Either a null value or a specific value from the data can be replaced
  2. The value can be replaced with a specific value or a derived value from data like Average, median, mode

Scenario

In the above screenshot, there is a value -10 which needs to be replaced with 10. Below settings for Sales Amount column in Imputation tool will replace the values

Data Cleansing and Imputation Tool in Alteryx

There are options as create a Flag column to indicate the replaced records or to replace the values in a separate Column without disturbing the original column.

Data Cleansing and Imputation Tool in Alteryx

Output

Data Cleansing and Imputation Tool in Alteryx

Scenario

In the above screenshot, the null values need to be replaced with some meaningful value. Below setting for Sales Amount in Imputation tool will replace the null value to Average value.

Data Cleansing and Imputation Tool in Alteryx

Output

Data Cleansing and Imputation Tool in Alteryx

With the help of Data Cleansing and Imputation tool, the data looks much cleaner and meaningful.

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here. 


Corporate HQ:
5920 Windhaven Pkwy, Plano, TX 75093

+1 888-227-2794

+1 972-232-2233

+1 888-227-7192

solutions@visualbi.com


Copyright © Visual BI Solutions Inc.

Subscribe To Our Newsletter

Subscribe To Our Newsletter

Join our mailing list to receive the latest news and updates from our team.

You have Successfully Subscribed!

Share This!

Share this with your friends and colleagues!