Blogs / Business Intelligence / Database / In-DB Transformation in Cloud Lakehouse

In-DB Transformation in Cloud Lakehouse

May 26, 2021

SHARE

Introduction

Many parts of the data landscape have undergone remarkable transformations in the past few years. An increase in data volume, data transfer speed, advancements in IT infrastructure, and data processing have drastically influenced data architecture decisions. With the migration to cloud data lakehouses (data warehouses and data lakes as SaaS), it is becoming easy to move away from ETL (Extract, Transform, Load) to ELT (Extract, Load, Transform). But, before we accept the decision as a norm, we must understand the DNA of ELT that makes it different from the traditional ETL approach – “in- DB Transformation.”

In-DB transformations

The most significant and discussed difference between cloud and on-prem systems is their “serverless” design. The sizing of on-prem systems is a complicated affair, coupled with constantly evolving factors like the business user needs, technology evolution, and data volume, it becomes almost impossible to size your hardware correctly; i.e., not oversized or undersized. Nearly all cloud data lakehouses have automatic scaling, which means having the right resources required for the workload.

In ELT, the process will run either native SQL scripts or stored procedures or use tools like dbt (data build tool) to transform data from the raw format (i.e., a replica of the source) to support data analytics. These transformations can now be done in DB without moving data across servers. Apart from pushing all the complex transformations into a massively parallel processing server, there are several other advantages in choosing in-DB transformation for cloud lakehouses.

  1. ETL is Extract, Transform & Load 

In most ETL processes, the primary time-consuming steps are “Extract” and “Load.” During these processes, either data is loaded from DB servers to ETL servers or vice versa. But when we do in-DB transformation, the data stays within the scope of the data lakehouse server. It is not being transferred between other heterogeneous systems. Literally, “Read” and “Write” can happen within few milliseconds, so the processing times are insignificant under in-DB transformations.

  1. Does size matter for Transformation 

ETL systems are optimized for the transformation of data but not for storing data. So, when we work with a large table, a cache is built within ETL tools. Every transformation step works with the data in the cache, and generally, transformation tends to become exponentially slower as the size of the data grows. But with in-DB transformation, data is processed as a temp view in the DB. This means the speed of the execution is not overly dependent on the size of the data. 

  1. Modular & Parallel Transformation 

In-DB transformations can be broken down into modular items, this means we have an option to create modular transformations that can run in parallel and efficiently without replicating or transferring the data.

  1. Low maintenance 

As the raw data is available all the time, rerunning historical data loads, adding new columns, or creating new calculations is much easier and faster to handle when using in-DB transformations.

  1. Real-time data processing 

Since every step in ETL is processed at blazing speeds, it becomes easier to process real-time data when using in-DB transformations. Also, support for the data types is dependent on the data lakehouse but not on the ETL tool that we use. 

The migration from on-premises to cloud data lakehouses is changing the way how we do data engineering and analytics. When we combine the scalability, massively parallel processing, stellar performance, separated storage and compute resources, optimization for analytical warehouse, automatic indexing and compression, query results caching and much more. It becomes an easy decision to move from ETL to ELT approach when we switch from on-prem to cloud. In-DB transformations for cloud Data lakehouses are continuously evolving, and it seems like a solution that is ready for today and built to handle tomorrow’s data.

Reach out to us here today if you are interested in evaluating which ETL or ELT tool is right for you.  

To learn more about Visual BI’s recommendation on Architecture and Strategy, click here.  

Reach out to us at solutions@visualbi.com for a personalized offer covering one or more of the following:  

  • Strategy Workshops (<1 day)  
  • Proof of Concepts (1-3 weeks)  
  • Custom Implementation and Training  

To know more about Visual BI Solutions Data Architecture and Engineering offering clickhere.  


Corporate HQ:
5920 Windhaven Pkwy, Plano, TX 75093

+1 888-227-2794

+1 972-232-2233

+1 888-227-7192

solutions@visualbi.com


Copyright © Visual BI Solutions Inc.

Subscribe To Our Newsletter

Subscribe To Our Newsletter

Join our mailing list to receive the latest news and updates from our team.

You have Successfully Subscribed!

Share This!

Share this with your friends and colleagues!