In most ETL tools, we are limited by what the IDE offers by way of functions and transforms available. As a workaround for more complicated logic, Data Services allows you to define your own transforms by way of Python code which will then be executed by BODS. This gives the flexibility to achieve the transformation of data at any complex level and create new records, datasets out of it based on our requirements. Here, we will discuss one such use case of user defined transform.
We need to retrieve the employee name from employee information table via API based on the employee id in the order table which reside in our on-premise database.
Employee Table (this table will be accessed via API):
Employee ID | Employee Name | Country | Mail id |
100 | John | US | John@xxx.com |
101 | Jack | UK | Jack@xxx.com |
102 | Sunil | India | Sunil@xxx.com |
103 | Saroj | India | Saroj@xxx.com |
Order Table (source table we have in DS):
Order ID | Employee ID | Product ID | Payment Method |
10001 | 100 | 201 | Paypal |
10002 | 101 | 202 | Visa Card |
10003 | 102 | 203 | Payal |
10004 | 103 | 204 | Cash |
Target table required:
Order ID | Employee ID | Employee Name | Product ID | Payment Method |
10001 | 100 | John | 201 | Paypal |
10002 | 101 | Jack | 202 | Visa Card |
10003 | 102 | Sunil | 203 | Payal |
10004 | 103 | Saroj | 204 | Cash |
Here, it might look like a simple case of two tables and looking them up. However, the requirement is that the Employee table contains sensitive data and cannot be exposed directly to any tool, instead controlled APIs are available which control the data that can be accessed.
User defined transform generally works based on per record or per collection.
Per record – It will take each record from source and retrieve the output.
Per Collection – Use the Group Forming operations (Break Group and Candidate Selection) to group records for Python processing.
In our scenario, we need to retrieve the employee name that belongs to employee id based on each record. Hence, we are going with per record option. Below given block diagram will explain the transformation of data with user defined transform.
Sample Code Snippet
High Level Block diagram to illustrate the user defined transform
There are many more use cases and in some cases, the user-defined transform can be used to replace existing logic and help streamline your data flows and in other cases, establish a single version of the transform by reusing the transform in multiple transformations.