Azure Data Lake Storage Gen2 is Microsoft’s latest version of cloud-based big data storage. In the prior version of Azure Data Lake Storage, i.e.., Gen 1 the hot/cold storage tier and the redundant storage’s were not available. Though the blob storage in Azure had the capability of hot and cold storage but was short of features like a directory, and file-level security, etc., which are available in Azure Data Lake Storage Gen1. In order to overcome this difference in storage and features, Azure Data Lake Storage Gen2 is Microsoft’s latest version of cloud-based big data storage.
Azure Data Lake Storage Gen2 is built on Azure Blob storage as its foundation. Azure Data Lake Gen 2 contains features from Azure Data Lake Storage Gen1 such as file system semantics, directory, file-level security, and scalability are combined with low-cost, tiered storage, high availability/disaster recovery capabilities from Azure Blob storage.
Azure Data Lake Gen1 vs Azure Data Lake Gen2
|Azure Data Lake Gen1||Azure Data Lake Gen2|
|Azure Data Lake Gen 1 is file system storage in which data is distributed in blocks in a hierarchical file system.|
|Azure Data Lake Gen 2 contains both file system storage for performance & security and object storage for scalability.|
|Hot/Cold storage tier not supported||Supports Hot/Cold Storage tier|
|Redundant Storage not supported||Supports Redundant Storage|
|Azure data lake Analytics support available||Azure Data Lake analytics support is not available (till date 2nd July 2019).|
Creating Azure Data Lake Gen2 and Converting Blob Storage to Gen 2
1. Go to all resource -> Click Add -> Choose Storage Account -> Choose Account Kind as Storage V2.
2. Once you created the Storage Account. Go to Configuration -> Enable Hierarchical Namespace.
Azure Data Lake Gen 2 provides different access tier for storing the data.
When Data Lake Gen 2 is created with Hot access tier then the file available in the storage is readily accessible. Storage Cost for hot access tier is higher whereas Access cost is lower. In case these files are not being accessed frequently it will lead to paying a lot of costs.
The purpose of creating the Cool access tier in Data Lake Gen 2(Storage Account V2) is that the file or storage is not accessed frequently. For example, monthly reports or annual reports which will be consumed once monthly or yearly have less access which will reduce the access cost. In the Cool Storage access tier, Storage cost is lower whereas Access cost is higher. In case these files are accessed frequently it leads to paying a lot of costs.
You can choose the hot access tier at the time of creating the Storage Account V2 (Data Lake Gen 2).
You can also change the access tier by object level once you created the storage account which was available in blob Storage not in Azure Data Lake Gen 2 General availability.
To convert the existing Blob Storage Account v1 to Gen 2, you need to upgrade the storage account from V1 to V2 by clicking the upgrade button under configuration and enable the Hierarchical Namespace under the Data Lake Gen 2.
Use cases of Blob Storage and Azure Data Lake Gen 2
Blob storage can be useful when you are going to store only backup files, images and videos which have very less transaction on top of the blob storage may reduce your transaction cost. When comparing the transaction cost of Gen 2 and Blob storage, Gen2 transaction cost is a little high due to the overhead of namespace. However, storage cost for Blob storage and Gen 2 will be the same. Choose the storage based on your usage like analytical and non-analytical use cases.
Pros of Azure Data Lake Gen 2 over Gen 1
- Azure Data Lake Gen 2 contains file system storage and object storage which is available in Blob storage which gives the flexibility to store Excel, Images, and Videos, etc.,
- Hierarchical File system leverages better query performance in ADLS Gen 2.
- Object storage leverage better scalability and is cost-effective.
- Granular security to files and directory level can be achieved with help of Role-based Access Control (RBAC) and Access Control List (ACLs)
A diagram to illustrate the Azure Data Lake Gen 2
Cons of Azure Data Lake Gen 2 (Expecting Update soon for the below features)
- Snapshots and soft delete which available in Azure storage is not available in Gen 2
- Object-level storage tiers (such as hot/cold/archive) and lifecycle management are not available in Gen 2
- Direct connectivity from Power BI or Azure Analysis Services are not available. Power BI Dataflows can connect with Azure Data Lake Gen 2
- Integration with Azure Data Lake Analytics (U-SQL) is not available as of now in Gen 2
For more information on limitation, please refer below link:
Pricing of Azure Data Lake Gen 2
When we compare the Azure Data Lake Gen 2 pricing with Gen 1, Gen 2 pricing will be half the price of Gen 1.
Upgrading Azure Data Lake Gen 1 to Gen 2
Please refer to the Microsoft recommended practice for upgrading Gen 1 to Gen 2.