Power BI Dataflow is a Self-Service revolution providing extensive capabilities on data preparation for the Power BI Online Service. It allows large-scale, complex and reusable data preparation to be done directly on the Power BI Service and store it in ‘Azure Data Lake Storage Gen2’.
Power BI Dataflow
Power BI Dataflow is user-friendly and leverages the Power Query Editor in Power BI. With Power BI Datasets, we can import large amounts of data and schedule it for frequent refreshes in the service. However, loading large historical datasets in Power BI without ETL and Data Warehousing is always tedious. We should not expect agile performance when loading large datasets in Power BI. To overcome this, Microsoft provides us with an excellent solution called Power BI Dataflow. It can handle large volumes of data preparation by leveraging the ‘Azure Data Lake Storage Gen2’ which is designed for even the largest datasets.
Each table/query here is stored as entities inside the dataflow and each entity can be scheduled for incremental refresh, independently. Power BI can consume any number of dataflows as data sources for creating datasets, reports, and dashboards.
Let’s take the scenario of storing historical data without ETL and Data Warehousing. Using Power BI datasets, the only way is to dump the entire dataset truncating the older data inside the dataset. Then schedule the data load refresh in the service which will start loading the daily transactional data. However, the dataset will grow each day and performance will continue to worsen as the data grows.
Whereas in a Power BI Dataflow, we can handle this in a much smoother way. Before beginning, ensure the workspace is enabled for Dataflow.
Note: Incremental refresh of entities in Power BI dataflow is allowed only for Premium users.
Incremental refresh for premium users
- Create a Dataflow
- Click on Workspace -> Create -> Dataflow
Create two entities, one for storing transactional data and another for storing historical data.
Entity for transactional data
Always stores data for the current year. Once the entity is created, schedule it daily as needed, so as to initiate the incremental refresh.
Entity for historical data
Only stores the historical data for the previous year and older. This entity can be scheduled if needed or can be triggered manually once a year as needed.
Incremental refresh for pro users
Power BI Dataflow adds this functionality for Premium Workspaces. For Pro users on standard Workspaces, the functionalities are limited.
For our scenario, Power BI Dataflow does not allow us to create an incremental refresh of entities for Pro accounts. So, we need to have a workaround for history preservation. Instead of creating two entities in the same dataflow, create two different dataflows with two different entities and schedule each dataflow to refresh as needed.
Create two dataflows with transactional and historical entity, respectively.
Schedule the dataflows as we would do for any Power BI Datasets in the service. We can schedule the transactional dataflow every day so that it will start incremental loads. Historical data will be triggered manually.
Therefore, the Power BI Dataflow can replace traditional ETL and the Azure ELT processes, while also:
- Reducing the refresh time
- Creating a more reliable process
- Reducing the consumption of data
- Improving the performance of Power BI Reports and Dashboards
Learn more about our Microsoft Power BI offerings here.