Data cleansing is a part of data wrangling by which the data is transformed in order to get meaningful analysis.

Uncleansed data may contain unwanted characters, null values, unformatted categorical values which may result in misguided decision making. Depending on the data pattern, replacing quantitative nulls with statistically derived values may avoid significant variation in the analysis.

Data Cleansing is a necessary preprocessing step in most areas including,

  1. Machine learning
  2. Data Prediction
  3. Data Analysis and Reporting

Data Cleansing can be performed in Alteryx using tools like Data Cleansing and Imputation.

Qualitative values can be sanitized with the help of Data Cleansing Tool and Quantitative values with Imputation Tool. These tools can be found under Preparation tab in Alteryx,
data-cleansing-imputation-tool-alteryx

 

Data Cleansing tool can be used to format the Data in the following possible ways:

  1. Removing unwanted spaces
  2. For a String Datatype column, removing letters from a Numeric column like Zip code, Phone number or removing Numbers from a Text column like Country Code
  3. Removing Punctuations
  4. Converting the case of the letters (e.g. ALL CAPS for Key, Camel Case for Description)

Scenario

Below is the sample data where Country Column has different formats which have unwanted spaces, Numbers, punctuations, no consistent case.
data-cleansing-imputation-tool-alteryx

Below enabled the setting for Country Column in Data Cleansing tool will format the values to make the data consistent.
data-cleansing-imputation-tool-alteryx

Output

data-cleansing-imputation-tool-alteryx

Scenario

In the above screenshot, the Product ID column is supposed to have only numbers. There are some unwanted characters that need to be cleansed.

Below settings for Product ID column in Data Cleansing tool will clean the data
data-cleansing-imputation-tool-alteryx

Output
data-cleansing-imputation-tool-alteryx

Imputation tool can be used to replace values in the following possible ways,

  1. Either a null value or a specific value from the data can be replaced
  2. The value can be replaced with a specific value or a derived value from data like Average, median, mode

Scenario

In the above screenshot, there is a value -10 which needs to be replaced with 10. Below settings for Sales Amount column in Imputation tool will replace the values
data-cleansing-imputation-tool-alteryx

There are options as create a Flag column to indicate the replaced records or to replace the values in a separate Column without disturbing the original column.
data-cleansing-imputation-tool-alteryx

Output

data-cleansing-imputation-tool-alteryx

Scenario

In the above screenshot, the null values need to be replaced with some meaningful value. Below setting for Sales Amount in Imputation tool will replace the null value to Average value.

data-cleansing-imputation-tool-alteryx

Output

data-cleansing-imputation-tool-alteryx

With the help of Data Cleansing and Imputation tool, the data looks much cleaner and meaningful.

 

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here. 

 

Subscribe to our Newsletter

5920 Windhaven Pkwy
Plano TX 75093.

+1 888-227-2794

+1 972-232-2233

+1 888-227-7192

solutions@visualbi.com