We may have come across many instances where we have found that we need to copy files from one location to another. Migration Utility comes to a rescue. But, what if the copy needs to be done on periodic basis; say daily or every EOD update?
Microsoft comes with one Azure service called Data Factory which solves this very problem. Data factory enables the user to create pipelines. And one pipeline can have multiple wizards, i.e. Data Transformation, Data Integration and Orchestration. It provides Copy wizard to copy the files from multiple sources to other sources.
To begin with the copy, source datasets and sink datasets needs to be identified and subsequently their associated Linked Service needs to be created.
Once the set-up is done, a new pipeline needs to be created with the ForEach wizard as the main component. This acts similar to parameterized For loop. The associated activity will be Copy wizard, which needs to be executed multiple times, or as required. The source and sink dataset will also be parameterized to make it dynamic.
In the below example, multiple files are stored at the dynamic location of Azure data Lake Store and the same needs to be copied to Azure Datawarehouse in dbo schema.
The parameter given to the iterator will be passed to the Copy wizard and hence can be further carried forward to source and sink dataset.
Here, the parameter used is:
The parameter is segregated for source and sink dataset. Which source dataset is related to which sink dataset is already defined in the parameter.
Taking each copy into consideration, the source and sink mapping are not required. It is dynamic. In case of the database/ warehouse loading, the automatic mapping is taken as Sequential, if the column headers are not present and if present, the column headers are matched and the source column is copied to respective column in the sink accordingly. Please make sure to have exactly similar column headers as they are case-sensitive.
The iterator accepts the default parameter and hence it can be executed in conjugation with other pipelines. The pipelines can be scheduled and thus the copy can be scheduled in bulk.
The same pipeline can be used for copying one file using same datasets and prove to be one of the most helpful assets in debugging and re-running the pipeline in case of fix-measures.