As a continuation of our previous blog ‘Can Snowflake fit your pocket as Complete DW Solution’, we are going to discuss the pros and cons of Snowflake when it’s used as a DW solution.
- Zero Cloning
With this inimitable feature of Snowflake, we can forget about maintaining various copies of data in different environments (SBX, QA and PRD). We can just clone the existing data without additional expenditure and start using that data for testing/other purposes. Even when we want to enhance a feature say add an extra column to the existing table, then we are charged only for the storage of that column and not for the replica of the table. This lowers our cost and ensures the single point of truth across environments. This reduces the heavy burden of Transport mechanism between environments.
Another advantage of moving the data to the cloud is the Scalability. Depending on the usage, we can automatically adjust our clusters which enables us to scale up the performance on demand. As snowflake separates the Storage layer from computing we can scale out the concurrency depending on the number of users/user queries hit in snowflake. Snowflake offers Instant Elasticity to their customers providing an unlimited number of users with consistent performance, predictable pricing and no overbuy.
- Near-Zero Management with Optimized Storage and Caching Techniques
We need not specify any tuning or indexes as Snowflake takes care of everything automatically. Snowflake has an impressive dynamic caching mechanism. When any new query matches with the existing result in the cache then the result is returned immediately without any computation for the past 24 hours, provided the data hasn’t changed. It has 3 layers of Cache (Resultant, Remote and Local disk Cache) to ensure faster retrieval of data. The data gets stored as micro-partitions in Snowflake, another aspect which contributes to quick data retrieval.
Snowflake stores all its data only in encrypted form. It has end-to-end encryption, the data that is stored inside the disks, or when it’s in motion and the users can forget about building complex security models from their end. It supports two-factor authentication and federal authenticator with the single sign-on option. Any authorization given to the users is role-based and we can set up predefined policies for limited access. Snowflake has SOC 2 Type 2 certification on both AWS and Azure.
- Data Sharing (Single point of truth)
Snowflake offers Secure Data Sharing; we don’t need to worry about the data silos anymore. Snowflake has a feature from which you can share account-to-account the snowflake tables, views etc. All database objects that’s been shared are read-only. The producers (one who shares their data) grants access to the database objects and these objects are consumed by the consumers in ready only format. Snowflake also supports reader accounts, a special type of account that consumes shared data from a single provider account.
- Integration with other tools
Snowflake has an exhaustive list of connectors that get connected to 3rd party tools and technologies. Its connectors include SnowSQL Client, Snowflake connector for Python, one for Spark and another for KAFKA. It also has Node.js Driver, Go Snowflake Driver, .NET Driver, JDBC Driver and ODBC Driver. So, with this Snowflake can be integrated with numerous technologies and all possible software’s. Snowflake also provides detailed documentation on connecting with various drivers/connectors.
- Compatibility with ERP Systems (Planning)
Companies have added advantages when they have a full stack ERP-DW combination for their BI purposes (Say SAP ECC- SAP HANA/BW) But when we try to integrate Snowflake with existing ERP System, we may foresee compatibility issues and plenty of development efforts. We may need to do some workarounds or leverage the support of third-party tools in order to achieve full compatibility.
- RLS and CLS
By Default, when compared to other Datawarehouse, Snowflake doesn’t offer Row Level Security we need to manually maintain a security table or landscape in order to monitor it. When a new user is added/removed there is a process of manual intervention involved in order to grant/revoke their respective access to the account.
- Rogue Queries can shoot up the cost
Unlike On-Prem Environment where a rogue query could ultimately lead down in shutting the system. In Snowflake, the query would be running exhausting all the compute thereby increasing our costs. Though there are procedures that can be put in place where we can limit the number of Computes used by the user still it leads to exhausting the user limit and thereby wasting or shooting the costs higher.