Nowadays the amount of Semi-structured data used is significantly higher compared to the previous decade. This is because the value of information being derived from data provides more insights, and these insights help businesses make informed decisions. Therefore, it becomes important to efficiently load data that is used for analysis.
Considering the role of Semi-structured data, it can easily deal with the heterogeneity of sources and the data is portable. Moreover, an XML file is widely used to store and exchange Semi-structured data and allows the user to define tags and attributes to store the data in a hierarchical form. Semi-structured data support the users who cannot define their needs in the structured query language like SQL etc..
Snowflake provides options to load Semi-structured data like JSON, Avro, Parquet etc using Variant type. The loaded data can be accessed from Tableau using Custom SQL options for visualizing purposes.
How Semi-structured data is stored in Snowflake?
The support for Semi-structured data that includes flexible-schema data types for loading the data without transformations is provided by Snowflake. The following data types: Variant, Object and Array are used to signify arbitrary data structures. Based on which we could import and operate on Semi-structured data.
A variant in Snowflake can store values of any data type up to a compressed size of 16MB including objects and array. The Semi-structured data can be loaded as a variant type which is stored as separate physical columns in Snowflake. The data stored can then be loaded to Tableau for further analysis.
Loading Data in Snowflake
The JSON file that contains the data can be loaded in Snowflake storage using the following steps:
Login to your Snowflake and create a database and schema either using GUI or query.
- Create database DB_NAME;
- Create schema SCHEMA_NAME;
Regardless of the data stored internally or externally, the location where the data stored is known as ‘Stage’. We must create a stage to store our data to be analysed.
- Create a stage “DB_NAME”.”SCHEMA_NAME”.”STAGE_NAME”
- Create File Format with data type JSON. Since the Semi-structured data that is used here is of the JSON format. Use the ‘CREATE’ option under the File Formats tab.
A dialog box appears where you can name the file format and choose its format type.
- A table with a Variant column type must be created to load our Semi-structured data into it.
- Create or replace table TABLE_NAME (Col_Name Variant);
- Load the JSON file under the Tables tab by choosing the table created using the User Interface.
Flatten the data
Now we have our data in a single column variant. We need to view the data in a structured format. Flatten is a table function that is used to convert the Semi-structured data to a relational table structure. The data with type Variant can be given as an input through an expression. This eliminates the need for an explicit reference for the array locations.
To query the values in a Semi-structured dataset loaded into the table, enter the below SQL code
Select * from TABLE_NAME, lateral flatten(input=>VariantColumn.array_pointer)
The values in JSON format are stored as arrays and can be viewed as individual records. The format for querying specific values from JSON is as below:
Select v.value:field_name::Data type Name_of_Column from TABLE_NAME, lateral flatten(input=>Field to be flattened) v
Select v.value:rain::Varchar Rain from WEATHER_TABLE lateral flatten(input=> C1) v
Custom SQL in Tableau
The Custom SQL feature in Tableau can be used to write a SQL query to retrieve the relevant data. This is most useful when you know the specific data required for analysis which saves a lot of effort.
Connect Tableau to Snowflake and drag/click the ‘New Custom SQL’ option which automatically opens a dialog box where the custom SQL can be written. The results of the query can be previewed using the ‘Preview Results’ option.
Now the queried data can be used to create visualisations in Worksheets of Tableau.
The data queried by Custom SQL can be published to Tableau Server to reuse in other workbooks or across multiple users. This ensures governance and consistent version of the JSON data being used for analysis.
Want to know more about connecting to Snowflake from Tableau? Click here.