Data cleansing is a part of data wrangling by which the data is transformed in order to get meaningful analysis.
Uncleansed data may contain unwanted characters, null values, unformatted categorical values which may result in misguided decision making. Depending on the data pattern, replacing quantitative nulls with statistically derived values may avoid significant variation in the analysis.
Data Cleansing is a necessary preprocessing step in most areas including,
- Machine learning
- Data Prediction
- Data Analysis and Reporting
Data Cleansing can be performed in Alteryx using tools like Data Cleansing and Imputation.
Qualitative values can be sanitized with the help of Data Cleansing Tool and Quantitative values with Imputation Tool. These tools can be found under Preparation tab in Alteryx,
Data Cleansing tool can be used to format the Data in the following possible ways:
- Removing unwanted spaces
- For a String Datatype column, removing letters from a Numeric column like Zip code, Phone number or removing Numbers from a Text column like Country Code
- Removing Punctuations
- Converting the case of the letters (e.g. ALL CAPS for Key, Camel Case for Description)
Below is the sample data where Country Column has different formats which have unwanted spaces, Numbers, punctuations, no consistent case.
Below enabled the setting for Country Column in Data Cleansing tool will format the values to make the data consistent.
In the above screenshot, the Product ID column is supposed to have only numbers. There are some unwanted characters that need to be cleansed.
Below settings for Product ID column in Data Cleansing tool will clean the data
Imputation tool can be used to replace values in the following possible ways,
- Either a null value or a specific value from the data can be replaced
- The value can be replaced with a specific value or a derived value from data like Average, median, mode
In the above screenshot, there is a value -10 which needs to be replaced with 10. Below settings for Sales Amount column in Imputation tool will replace the values
There are options as create a Flag column to indicate the replaced records or to replace the values in a separate Column without disturbing the original column.
In the above screenshot, the null values need to be replaced with some meaningful value. Below setting for Sales Amount in Imputation tool will replace the null value to Average value.
With the help of Data Cleansing and Imputation tool, the data looks much cleaner and meaningful.