Triggers are like job schedulers for the execution of pipeline runs in the Azure Data Factory. Presently there are three types of triggers that are supported in ADF.
- Schedule trigger: A trigger that executes a pipeline on an absolute schedule
- Tumbling window trigger: A trigger that operates at periodic intervals and also retains state
- Event-based trigger: A trigger that responds based on events
In this blog, we will discuss the tumbling window trigger and how it supports fetching historical data in the Azure Data Factory.
Tumbling Window Trigger
A typical ETL Package is built to compute data from a point in time forward. So, historical data will not be loaded from source to target and a separate load is required between the datasets. The process of adding missing data from the past to the target is termed as Historical Data Collection or Data Backfilling. In traditional ETL, backfilling of data requires enormous amounts of manual work and time to build effective SQL scripts. Microsoft has provided a feature named Tumbling Window Trigger, which is primarily designed for fetching historical data using Azure Data Factory. A tumbling window trigger will fire in a sequence of non-overlapping and contiguous periodic time intervals from a specified start time while also retaining state.
Execution of the Tumbling Window Trigger
We first need to create a tumbling window trigger for fetching historical data in Azure Data Factory under the Triggers tab by defining the properties given below.
- Name – Trigger Name
- Type – Type of the trigger – ‘Tumbling Window’
- Start Date (UTC) – The first occurrence of the trigger, the value can be from the past
- Recurrence – A frequency unit at which the trigger recurs. The accepted values are minutes and hours. For example, if the trigger needs to trigger once every day, then the recurrence is set as 24 hours
- End (UTC) – The last occurrence of the trigger, the value can be from the past
- Delay – The amount of time to delay before starting the data processing for the window. This will not affect the Start Date (UTC)
- Max Concurrency– The number of active synchronous trigger runs that are fired for windows
- Retry Policy: Count– Number of retries if the pipeline run fails
- Retry Policy: Interval in seconds – Delay between retry attempts
In the pipeline section, execute the required pipeline through the tumbling window trigger to backfill the data.
In the example below, I have executed a pipeline run for fetching historical data in Azure Data Factory for the past 2 days by a tumbling window trigger which is a daily run. I have taken 04/22/2019 as the current date so the start date will be 04/19/2019 as it is two days prior to the current date.
The execution result of the trigger run is in the form of output values of the tumbling window trigger:
- Trigger Time – Current Time
- windowStartTime – The actual window start time which is preserved with the periodic interval from last trigger window start time. The first window start time is the actual trigger start time that has been scheduled manually by the user. After that, it is calculated by adding recurrence value to the last windowStartTime.
- windowEndTime – The actual window end time and it is calculated based on window start time by adding the recurrence value to it.
The windowStartTime and windowEndTime can be passed as values for the pipeline run to fill timestamp parameters within an activity and for fetching historical data in Azure Data Factory. Add a dynamic parameter for timestamp and call this parameter using the expressions given below:
- windowStartTime – trigger().outputs.windowStartTime
- windowEndTime – trigger().outputs.windowEndTime
- Trigger name acts as a unique identifier. Modifying the properties after publishing a trigger, followed by one execution of the trigger run already completed, will not affect or re-execute the past run. This happens because the trigger checks for the trigger name and considers the backfill as already completed. The modifications will be active for the future pipeline runs only.
- Once the trigger is published, we cannot make modifications to the start date, however, the end date can be modified.
- The trigger is executed on a regular interval mentioned in recurrence which is based on the start date. Once the trigger is published, it will wait until the recurrence. The backfill will start only after that. For example, if I schedule a run at 9:50 AM at a frequency of 24 hours and publish the trigger at 8:30 AM, the trigger will wait until the recurrence time which is 9:50 AM to kick off the pipeline run.
- One tumbling window trigger can execute only one pipeline and it is a one-to-one relationship between them.
Learn more about Visual BI’s Microsoft Azure offerings here.