For Alteryx developer, it is good to know the advantage of using Calgary DB over other flat files. It will ramp up your workflow performance. As mentioned in the Alteryx forum, Calgary is a data retrieval engine to perform analysis on a large dataset using indexing methodology. This blog compares Alteryx DB, Calgary DB and CSV in varies scenarios. To demonstrate each scenario, we have taken dataset which contains eleven million records.

Scenarios to Compare

We have taken the below scenarios based on the tools available in Calgary

  • Creation
    1. Creating CSV
    2. Creating Alteryx DB
    3. Creating Calgary DB
  • Data retrieval
    1. CSV
    2. Alteryx DB
    3. Calgary DB
  • Data Joining
    1. CSV
    2. Alteryx DB
    3. Calgary DB
  • Record Counting
    1. CSV
    2. Alteryx DB
    3. Calgary DB

1. Time-consuming for creation

In this scenario, we have taken our dataset as CSV format, it will act as Source. Then our workflow will create CSV, Calgary DB(CYDB), Alteryx DB(YXDB) from the source file. Here we’re capturing workflow execution time in order to determine best file format to create a large dataset.

performance-enhancement-alteryx-significance-calgary-db

CSV

 

performance-enhancement-alteryx-significance-calgary-db

Alteryx DB

 

performance-enhancement-alteryx-significance-calgary-db

Calgary DB

 

2. Time-consuming for Data retrieval

In the previous workflow, we just converted our source file as Alteryx DB, Calgary DB, CSV. Then our workflow will filter records where a street column contains string ACR in all three formats. To determine, best file format to data retrieval from the large dataset we need to capture workflow execution time. Check the workflows below:

performance-enhancement-alteryx-significance-calgary-db

CSV

 

performance-enhancement-alteryx-significance-calgary-db

Alteryx DB

 

performance-enhancement-alteryx-significance-calgary-db

Calgary data

 

3. Time-consuming for Data joining

So far, we’ve created and retrieved data from all three formats. How about performing joining across three formats? Let’s create the workflow to determine the best file format to perform the join operation.

performance-enhancement-alteryx-significance-calgary-db

CSV

 

performance-enhancement-alteryx-significance-calgary-db

Alteryx DB

 

performance-enhancement-alteryx-significance-calgary-db

Calgary DB

4. Time-Consuming for Record Counting

In this scenario, we’ll perform aggregation on a large dataset using all three format to conclude the best file format for finding a record count of a specific field.

performance-enhancement-alteryx-significance-calgary-db

CSV

 

performance-enhancement-alteryx-significance-calgary-db

Alteryx DB

 

performance-enhancement-alteryx-significance-calgary-db

Calgary DB

 

Performance Summarization

Process

CSVAlteryx DB

Calgary DB

Creation

11.3 Seconds10 Seconds2:34 Minutes

Data Retrieval

10.2 Seconds3.4 Seconds0.2 Seconds
Joining17.8 Seconds12.7 Seconds

1.2 Seconds

Counting14.6 Seconds9.3 Seconds

0.6 Seconds

 

The significant difference observed between creation for CSV, Alteryx DB Vs Calgary DB is mainly because, with Calgary, a separate index file is generated for every dimension considered which adds up the additional delay in performance.

With the dataset in consideration, it can be inferred that the performance of Calgary DB is significantly better than CSV and Alteryx DB in terms of aggregation like counting records, performing joins or retrieval of data.

Reference

https://help.alteryx.com/2018.3/calgaryindex.htm

https://www.thedataschool.com.au/jonathan-waerner/using-calgary-in-alteryx/

 

Read more about similar Self Service BI topics here and read more blogs from Alteryx category here.

Subscribe to our Newsletter

5920 Windhaven Pkwy
Plano TX 75093.

+1 888-227-2794

+1 972-232-2233

+1 888-227-7192

solutions@visualbi.com