Azure Data Factory (ADF) is a cloud-based data integration solution that offers 90+ built-in connectors to orchestrate the data from different sources like Azure SQL database, SQL Server, Snowflake and API’s, etc. Azure Data Factory has recently added the Snowflake Connector to extract/load data from Snowflake with any of your existing legacy or modern Database/Datawarehouse. In this blog, we will outline the functionalities, architecture, features, use cases, limitations, and roadmap of this newly available Snowflake Connector.
The Snowflake Connector supports both Azure as well as the Self-hosted Integration Runtime. It supports the following ADF activities in the pipeline,
- Copy Activity
- Source – Snowflake on any Cloud Providers
- Sink – Snowflake on Azure
- Lookup Activity
- Snowflake as a source, you can load/integrate data to any of the ADF supported sinks, which utilizes Snowflake’s COPY INTO <location> statement to achieve the best performance.
- Snowflake as a target, you can ingest data from any of the ADF supported sources, which utilizes Snowflake’s COPY INTO <table> statement to achieve the best performance.
- You can also lookup the data from Snowflake table/view for the control flows in ADF.
Figure 1 depicts the data flow from Snowflake to any other target system using Snowflake Connector in Azure Data Factory.
Figure 2 depicts the data flow from any source system to Snowflake using Snowflake Connector in Azure Data Factory.
- When you are using Snowflake as a source, you can configure a table as a source, or can write a Select Query to read data from the existing tables or views in Snowflake.
- When you are using Snowflake as a sink, you can specify a SQL query for the COPY activity to run before writing data into Snowflake in each run. One available use case is to clean up the pre-loaded data.
- Additional Snowflake Copy Options- you can configure the Copy Options that are available in Snowflake’s COPY INTO <tale/location> statement. Examples: MAX_FILE_SIZE, OVERWRITE, ON_ERROR, FORCE, LOAD_UNCERTAIN_FILES, etc.,
- Additional Snowflake Format Options- you can configure the Format Type Options that are available in Snowflake’s COPY INTO <table/location> statement. Examples: DATE_FORMAT, TIME_FORMAT, TIMESTAMP_FORMAT, etc.,
- The integration of Snowflake with any of the legacy or modern databases and data warehouses becomes simple with the help of ADF’s snowflake connector.
- Using ADF’s Snowflake connector, semi-structured data can be loaded from JSON files into the Snowflake table’s VARIANT column.
- The ADF’s Snowflake connector uses COPY INTO <table/location> statement, which is the Snowflake’s most recommended and performant way to load/read data and can scale for a large volume of data as well.
- Intermediate staging in Azure Blob Storage is mandatory while using Snowflake connector. However, the automatic deletion of the intermediate stage will happen after each run.
- Authentication of the intermediate Stage (Azure Blob Storage) using Shared Access Signature (provides secure delegated access to resources in your storage account) is needed, which will be associated with a time span. So, after the expiry, we may need to regenerate it every time.
- Calling a Snowflake’s stored procedure is not possible in the source query or the pre-copy script section, in case of a target.
- Support for Snowflake in Mapping Data Flows as a source and sink of transformations to provide seamless ETL. However, now you can use Copy Activity in conjunction with Data Flow.
- Support for Snowflake Stored Procedures.
Learn more about Visual BI’s Microsoft Azure offerings here.