The latest release from Tableau, Tableau 9.0 touts a lot of new features that make wrangling with data a whole lot easier. One of the more significant of these features is the support provided by Tableau for Regular Expressions. So, what are Regular Expressions and why is Tableau’s support for regular expressions significant?

Regex

Regular expressions are wildcards on steroids. They are a pattern matching format for strings with an incredible amount of customization thrown in. Most of us would have felt the need to separate the phone number or an email id that is part of the data in a single cell. If the required number or email id was in the beginning or end we could have used left() or right()  function to extract them, but what if they are in the middle or if their positioning within the cell was random. Even if they were in the beginning or end of the cell, there is the unanswered question of the number that you would pass as argument to the function. A phone number will probably have the same number of digits but the number of characters in an email id is highly variable.

In such cases, Tableau’s support for regular expression saves us a great deal of time and effort. A phone number, email id or a zip code follows a particular format. An email id is usually of the format xyz@abc.com, a phone number or a zip code is usually a sequence of digits more often than not separated by a hyphen. It is these patterns that we leverage within Tableau to manipulate strings using regular expressions.

Tableau regular expressions

The pattern for catching an email id is [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[a-z]{2,4}). Let me explain this in detail.

  • [A-Za-z0-9._%+-] represents the set of characters before the @ symbol.+ represents that there can be one or more of the preceding character meaning one or more of the character types represented between the square brackets.
  • @ Represents the @ in the middle of an email id.
  • \. Represents the period in front of the domain name. The backslash is used because period is being a special character cannot be used natively.
  • [a-z]{2,4} represents that the domain name is made of alphabetical characters of length 2 to 4.

Tableau Regex

Let’s try out another use case for the Regex functions. Assume you have the order locations for different orders placed , you want to make sure you have the postal code to which you need to send your order deliveries to. You will have to analyse your order location string to find out if it has the required postal code. A postal code is a series of numerical charecters of length five. Let’s use REGEX_MATCH to validate each string.

REGEXP_MATCH

The REGEX function returns true if your order location string contains a series of five numeric characters thus validating your order location entry.

Regex function

Tableau 9.0 offers 4 functions  that leverage the power of regular expressions:

  • REGEX_EXTRACT(string,REGEX pattern) – Extracts the substring from a string field that matches the specified REGEX pattern.
  • REGEX_EXTRACT_NTH(string,REGEX pattern, index) – Returns the nth grouped substring in the string field where n is the index
  • REGEX_MATCH(string,REGEX pattern) – Returns True if the regex pattern is matched in the string field.
  • REGEX_REPLACE(string,REGEX pattern, replacement) – Replaces the matched pattern with the specified replacement text.

This is just a sample of what regular expressions are capable of. Further use cases include separating phone numbers and zip-codes from mixed data, reformatting data into a more presentable format ,testing the validity of a string entry and so on. regexr.com is a great place to learn more about regular expressions and to test out your regular expressions.

 

Learn More

Subscribe to our Newsletter

5600 Tennyson Pkwy
Suite 120
Plano TX 75024.

+1 888-227-2794

+1 972-232-2233

+1 888-227-7192

solutions@visualbi.com