Of the many Transforms available in SAP Data Services 4.2, Data_Transfer transform is arguably one of the least utilized transforms in implementations considering its powerful features. This is because this transform provides for benefits that can be realized only after a job has been defined.
The data_transfer transform works by offloading SQL intensive operations to the database providing for faster and leaner job execution. Due to this nature, it is in most cases implemented after the initial job is defined to increase job performance, which does not happen too often because the focus is on optimizing existing job statements rather than try and push them down to the database. In this document, we will see the use case and implementation steps for the Data_transfer transform in Data Services jobs.
When to Use?
Data_transfer transform can be used to push-down certain resource intensive operations for complex dataflows to the database and split the data flow for faster processing.
It uses a staging transfer object to get data from the source and then uses that data for further processing. The transfer type can be table or file. It is not necessary that the table or file should pre-exist, Data Services creates the table/file during job execution. This is similar to the caching option, however, in this case, it gets in-built with the transform and does not have to be called explicitly.
How to use?
Consider a scenario where the transaction table has Sales with State Key. We lookup Regions based on State Key in QRY_LKP and load it into the target table as grouping by Region.
The optimized SQL for this flow will be displayed as below:
What is essentially happening is that the lookup is first performed and then the group by, all in memory. This will cause a lot of performance bottlenecks if the target data table is big. Also since the target data table belongs to a different data store, all the data is held in memory.
This can be addressed if there is some staging layer in between which holds the data and allows for pushing the calculations down to the database which gives us the advantage of being able to do parallel operations and reduce application memory consumption.
Now we change that data flow and insert a Data_Transfer transform before the Group By operation to push it down to the Database.
The Data_Transfer transform has the below options:
There are 3 Transfer Types available, the use of which will depend on the use case and data volumes.
On selecting File Transfer type, you need to specify the file name & the directory. Here, we will see how to use Table Transfer type.
If you chose Table Transfer type, other tabs will be enabled. Specify the table name in General tab from the existing datastore that can be either the persistent cache table or database table. It is always recommended to enable bulk loading since individual logs for the records are needed.
The optimized SQL for this new flow will be:
You can see that a target Optimized SQL statement is generated by Data Services with GROUP BY Clause for the target table on the inclusion of Data_Transfer transform. This will push-down the GROUP BY operation to the database. During execution, you can see in the log that the single dataflow is split and executed serially, thereby increasing the job performance.
The Data_Transfer transform can be used to push down your SQL operations wherever possible to increase the performance and release application memory.
Read more blogs from Data Science Category here.