Persistent Derived Tables in Looker is one of the unique key features. Many BI software allows us to create derived tables but they are local to that particular software tools. Also when we talk about persistent tables, the first concern everyone raises is the performance. In this blog, we will see how Looker is changing the perception of Derived tables.
With Looker’s Persistent Derived Tables we can extend the capabilities of Looker to use it as a modelling tool. Looker supports many dialects, we can use LookML capabilities to create Derived tables and persist them in the database. In the database, these tables can act like your normal database objects.
At a high level, Looker’s derived table functionality provides a way to create new tables that don’t already exist in the database to extend our analysis. In some cases, they can also play a valuable role in enhancing query performance.
Below, we will see how to configure PDT’s (Persistent Derived Tables) in Looker.
Looker’s ability to provide PDTs depends on whether the database dialect supports them or not. Snowflake, Redshift, Google Big Query, SAP HANA, DB2, Azure Warehouse and much more support PDT’s. Follow here for updates.
The image below shows a sample of PDT enabled Looker database connection.
Where to Configure and Prerequisites?
In the Connections parameters, check the box “Persistent Derived Tables” to enable the feature when creating a connection or for the existing connection.
This reveals the Temp Database field and the PDT Overrides column. Looker displays this option only if the database dialect you chose supports using PDTs.
Temp Database is a Prerequisite.
Although this is labelled Temp Database, we will enter either the database name or schema name that Looker should use to create persistent derived tables with the appropriate write permissions.
Each connection must have its own Temp Database or Schema; they cannot be shared across connections.
Identify PDT and Datagroup Maintenance Schedule
This setting accepts a cron expression that indicates when Looker should check data groups and persistent derived tables (that are based on sql_trigger_value) to be regenerated or dropped. Below are sample cron expressions to use.
|*/5 8-17 * * MON-FRI||Check data groups and PDTs every 5 minutes during business hours, Monday through Friday=|
|*/5 8-17 * * *||Check datagroups and PDTs every 5 minutes during business hours, every day|
|0 8-17 * * MON-FRI||Check datagroups and PDTs every hour during business hours, Monday through Friday|
|1 3 * * *||Check datagroups and PDTs every day at 3:01 am|
Below are a couple of resources to assist with creating cron strings
- https://crontab.guru Help to edit and testing cron strings.
- https://www.crontab-generator.org Select time settings and the generator creates the corresponding cron string.
Please check here to configure sql_trigger_value regeneration for your database.
How to define Derived Tables?
We can define a derived table in one of these ways;
- Using LookML to define a Native Derived Table (NDT):
These derived tables are defined in LookML, referring to dimensions and measures in the model.
- Turning a SQL query into a derived table definition:
These derived tables are defined in SQL, referring to tables and columns in your database. We can use SQL Runner to create the SQL query and turn it into a derived table definition. We cannot refer to LookML dimensions and measures in a SQL-based derived table.
Making a Derived Table Persistent
There is no parameter that means “make this derived table persistent.” Rather, persistence is created by the addition of the datagroup_trigger, sql_trigger_value, or persist_for parameter.
Please find details here
These PDTs are rebuilt periodically based on one of three settings details here
- triggered by a change (using sql_trigger_value)
- a set time period (using persist_for)
- a caching policy definition in a data group, triggered by a change using SQL trigger, and assigned to the PDT using datagroup_trigger
Where to find created Persisted Derived Tables and Datagroups?
Find already created Datagroups and Persisted Derived Tables LookML under Admin->Database section. Clicking on LookML will take us to model where we defined these objects.
We can see default “ecommerce_etl” data group and default Persistent Derived tables.
There are some situations where you should avoid persistence. Persistence should not be added to derived tables that will be extended (The extends parameter allows you to reuse code), because each extension of a PDT will create a new copy of the table in your database. Also, persistence cannot be added to derived tables that make use of templated filters or Liquidparameters. There is potentially an infinite number of possible user inputs with those features, so the number of persistent tables in your database could become unmanageable.
Looker provides great flexibility to use PDT’s for performance by persisting results of complex queries, we can write results to the database. We can create PDT’s with LookML or with direct SQL for the database. We can regenerate or drop with PDT maintenance schedule and SQL trigger parameter values. Looker allows to Index PDT’s, in fact, it is a must. PDT is huge subject in Looker, please go through here for better understand caching and data groups .