As we know, the BI space is growing tremendously in the last few decades. Recently, “Snowflake” has been creating a lot of buzz. Snowflake offers unique features that make them the next big thing in the market. Entirely set up in the cloud, with its Multi-Cluster Shared Data Architecture, Snowflake already has a great head start and is looking to be a potential winner among all other DWs present in the market.
Advantage of Snowflake Architecture
Completely built on SQL, Snowflake, not only stores structured data, but also handles all the semi-structured data such as JSON, XML, Avro, etc. Data is sorted and stored in various micro partitions in columnar format, accounting for fast data retrieval. Snowflake has an impressive dynamic caching mechanism. When any new query matches the existing result in the cache then the result is returned immediately, without any computation for the past 24 hours, provided the data hasn’t changed. Snowflake has 3 different caches namely Result Cache, Local Disk Cache, and Remote Disk Cache.
Cloning of a Database helps in creating a copy of the existing instance of the database in seconds without disturbing the original database. Cloning doesn’t duplicate any data and performs it as a meta-data operation thereby not adding any extra costs on the storage of cloned data.
Another amazing feature of Snowflake is its hybrid architecture. The advantages of both Traditional Shared Disk and Shared Nothing Database Architecture make Snowflake separate the Storage layer from Computing. With this, we can load the data and query it at the same time without contention. The database storage layer resides in scalable cloud storage like Amazon S3 and the computation part is taken by the Virtual Warehouses, which can query databases and also, these can be created, resized and deleted dynamically depending on the usage of resources. This along with the cloud’s advantage pertaining to scalability and near-zero management make Snowflake, undoubtedly, the prime player in the market.
Snowflake has centralized storage which represents data as a single point of truth. With the help of Virtual warehouses, now the same data can be queried by different services, without any impact on each other. This is achieved using Multi-Cluster Warehouses, where each Virtual Warehouse scales independently but have access to the same data.
Snowflake introduced a concept called ‘Time Travel’ which acts as a snapshot of the given data at a point of time. By default, the time travel comes with 24 hours and for Enterprise Edition we can Time Travel up to 90 days. Native to the cloud, they take care of all fail-safe mechanisms. In the case of disk-failures, they provide 7 days of fail-safe protection. Let’s look at the various factors that make Snowflake a better tool than its competitors.
Snowflake stores all its data only in encrypted form. It has end-to-end encryption. The data stored inside the disks, or when it’s in motion, the users can forget about building complex security models from their end. It supports two-factor authentication and federal authenticator with a single sign-on option. Any authorization given to the users is role-based and we can set up predefined policies for limited access. Snowflake has SOC 2 Type 2 certification on both AWS and Azure. If users insist, an additional level of encryption can be provided across all network communications.
The major advantage of Snowflake is its cost. Though there are slight changes in cost over different regions. On average, the storage cost of 1 TB data is as slight as 23$ per month. Snowflake saves the cost by compressing the data stored in a 3:1 compression ratio. Unlike Google Big Query which charges for the uncompressed data, Snowflake charges only for the compressed data. This compression in data doesn’t have any impact on the performance as most of the operations are going to be through meta-data. The computation charge is separate, the charge is accounted on a pay per second basis depending on the Snowflake credit the Warehouse uses for computing. Another advantage is the Zero Clone concept, where we could clone the data but just pay for the master data once. This eliminates the need for separate environments. The cost factor varies for different editions of Snowflake and the size of the Warehouse we select for computing. But overall, this is a much better solution with added advantages of scalability and agility of the cloud.
Snowflake separates the Compute layer from the storage which boosts its performance, ‘CapSpeciality’, an insurance provider in the US was able to achieve 200x times faster reporting and were able to query 10 years of data in less than 15 minutes. The ability to upgrade and downgrade clusters automatically and the option to specify the minimum and maximum clusters make Snowflake unique. Native to SQL, it addresses both structured data and semi-structured data with no degradation in performance. We can specify the minimum and the maximum number of clusters for each Warehouse, thereby the Snowflake would automatically Scale-out for Concurrency and Scale-up for performance accordingly, without any manual intervention. Snowflake offers Instant Elasticity to its customers thereby providing an unlimited number of users with consistent performance, predictable pricing and no overbuy.
Instead of heavy investments in maintenance with other tools, Snowflake has Near Zero Administration. We need not specify any tuning or indexes as Snowflake takes care of everything automatically. Plus, the initial investments that occur on installing any other tools are completely removed out of the picture. Snowflake has a fail-safe mechanism that eliminates the need for any backup.
Snowflake is climbing and making impacts on the Analytics market with its exceptional architecture. The efficiency and flexibility offered to a traditional data warehouse on the cloud by Snowflake are significantly remarkable when compared to its competitors. These unique aspects make Snowflake a standout product to be watched out for.