dbt is a leading transformation tool used to develop data transformations in Snowflake. Over the last few years of working with many different types of companies, we have learned a lot about how to approach both common and uncommon scenarios. This blog will walk through a few of those scenarios in a dbt-Snowflake project and some tips and tricks to handle them.
Scenario 1: Renaming tables loaded using dbt seed
In dbt, you can load CSV files present in the data directory as tables into Snowflake using dbt seed. However, dbt seed will create tables in Snowflake with the same name as the file name present in the data directory.
Consider a scenario where you have three versions of the same file in the data directory, one each for development, quality, and production, as seen below.
If you run the following command in the production environment, it will create a table in the name country_prod as per the dbt seed command’s default behavior.
dbt seed –select country_prod
If you would like to have the name of the target table as COUNTRY, you may have to do some workaround for the same irrespective of the environment. The following macro and the post hook settings for seeds in the file dbt_project.yml will ensure that the table name is always COUNTRY.
Settings to be done in dbt_project.yml.
Scenario 2: Creating transient tables in development and permanent table in production
In a development environment where no business-critical data is present, it is preferable to create transient tables in Snowflake instead of permanent tables since transient tables are more cost-effective. However, you may want to create permanent tables in the production environment to store business-critical data since Snowflake offers more capabilities like Time Travel Retention and Fail-Safe periods. Depending on the target environment, you can choose whether a table must be transient or permanent. This requirement can be handled by setting up the dbt_project.yml file, as seen below.
Here, the string ‘prod’ refers to the target name of the production environment set up in the file dbt_project.yml.
Scenario 3: Reduce run time of dbt run
As the number of objects in the dbt project grows, you may run into scenarios where the dbt run command runs for a long time before firing commands to the database. This is because, on every dbt run, all the objects inside the project are parsed to create the Directed Acyclic Graph (DAG). The time it takes for parsing is directly proportional to the number of objects and the complexity of the dbt project. This can be avoided by the following setup in profiles.yml.
When the above configuration is used, dbt will not parse objects on every dbt run unless there is a change in files in the dbt project after the dbt run command’s last invocation. This will save a lot of time, especially in projects with many objects with complex dependencies.
Scenario 4: Environment specific sequence names
At times, it is necessary to use SQL sequences for surrogate keys. However, you must not directly mention the name of the sequence in the model, as seen below, as the model design must be environment-independent.
Like how you declare tables as Sources, sequence names can also be declared as Sources in .yml file.
After setting up the sources in the .yml file, the model can be designed as seen below.
We hope this blog gave you an idea of some real-time scenarios in a dbt-Snowflake project and how to use available dbt features to meet the requirements.
Please visit the link to explore more blogs on Snowflake and dbt.