First, we will discuss why Snowflake is getting very popular with the enterprises?
- It acts as a modern cloud data warehouse, data lake, data mart, Open Data Store.
- It supports various programming languages Python, Java, .NET, node JS, C, SQL Alchemy.
- Served in many platforms such as Machine Learning, Data Science, Business Intelligence, Data Integration, Security & Governance.
- It separates the storage and computation layer for scalability and cost-effectiveness.
Click here to know about the Snowflake ecosystem for modern cloud Datawarehouse
In this blog, you get to know why dbt is best suited for data transformation in Snowflake.
Suppose you are a Snowflake customer looking for a data transformation tool. Then dbt fits very well into the list of transformation tools choices as it can orchestrate data modeling and data transformation work. dbt can perform the transformation on data already loaded in your data warehouse. It can do ‘T’ in ‘ELT’ (Extract, Load, Transform).
Let us see the dbt roles converting data into a business model in Snowflake. If you are writing SQL to achieve transformation logic in Snowflake, then dbt can be your first option.
dbt Development Environment
- dbt CLI (Command Line Interface)
- dbt IDE build on dbt Cloud
Business to dbt dictionary
What makes dbt most useful with Snowflake?
Tables to view conversion
If you want to change the list of 100 tables into view. In Snowflake, you need to drop and create SQL statements or modify stored procedures. However, it is a time-consuming process
But in dbt, you can easily do this by changing materialization in a single file.
Stored Procedure Code
Consider you have written stored procedures to automate the process. One month later, if you want to see the underlying code, there is no direct way of doing it. But in dbt, those are shown in the model. Using the do statement, you can execute a set of statements dynamically in the model itself.
Grouping transformation logic
Let us say you have written SQL transformation logic for ten tables. If you want to group the logics into three different names like associate (logic 1,2,3), leave (logic 4,5,6,7,8), attendance (logic 9,10), and execute each group separately using dbt tags.
If more than one developer wants to work on the same objects, conflict is bound to occur. Here, dbt supports version control by tight integration with GitHub.
Usually, software developers share their solutions for a problem in an open-source community. This can help other developers to build on other’s experiences rather than starting from scratch.
Similarly, dbt provides a platform to share your analysis with the outside world using dbt packages. Eventually, for the frequently used sources, we will have a plug and play solution.
Click here to get some dbt packages
Here I am going to illustrate one of the dbt package: dbt -utils packages. It is a generalized package for a basic operation which includes a predefined function (dbt macro) for
Date and time
- Getting current timestamp
- Adding, subtracting date
- Truncate date
- Fetching last day
- Split text by a delimiter
- Distance between two coordinates
- Getting a unique column value
- Result as dictionary
- Generate series
- Generate surrogate key
- Pivot/unpivot values from rows to columns
By using dbt, you can spend less time on development and more time on analysis, other than transformation. Overall dbt offers version control, data validation, quality assurance, auto-documentation, and packages to support the development process.
Even though other data transformation tools like Azure Data Factory, Apache Kafka, MATILLION ETL, SAP Data Services, SnapLogic, Stitch, Trifacta, etc., are available for Snowflake – dbt is very popular because it is open source and very easy to use