How easy is it to move data from a plethora of sources into a single large repository and use it as a data source?
The above mentioned, plus the choice of data formatting, column remapping, inclusion/exclusion of fields and a whole new way of maintaining consistency in the data schemas. What if all these could be done within minutes and especially using an online tool.
Everyone is aware of Microsoft Flows, the tool that enables users to initiate triggers (emails, copy attachments, etc.,), receive updates, and lot other services on the go. Now with Data flows, they have integrated the functionality of data movements and preparations on the go, into the Power BI Service.
Dawn of Data Preparation on Power BI Service-
Power BI has recently come up with yet another strong feature, this time focusing on ETL and data preparation on the Power BI Service. This does take the Self-Service capabilities of Power BI up an extra notch. The feature is in Preview mode and the only place where the data could be loaded into right now will be the Azure Data Lake Storage (Gen 2).
Demo Walkthrough of a simple Data Flow and discussion on its features:
Creating a new Data Flow:
- Source: Azure SQL Database
- Schema: demo
- Table: Sample_Data
This screen looks familiar to those who have been working with the desktop application of Power BI. Yes, it is the Power Query Editor which is now made completely free of the desktop version and can function independently on the PBI Service.
Adding Conditional Columns:
Using the “Add Conditional Column” to define a new column based on a reference column.
For a date-based column, we see that the conditions that can be chosen are related to timeline.
Column Name: SalesQuarter
Generate a readily available Index column if the dataset was missing any Primary key:
Adding a new calculated column:
One last feature that data flows also integrate is the “Advanced Editor” which creates a query of the table that precisely brings the data transformed, modeled into the format we have in the final table.
Common Data Model:
With CDM, Microsoft has provided a variety of data schemas that are widely used in numerous industries. Based on their collection, users can map the columns in their table to the ones available in data schemas and deploy the data flow. This facilitates a more standardized column-naming and readily available schemas to be used during a fast ETL model building.
Configuring the created Data flow:
Using the Power BI Data Flow as a data source from Power BI Desktop:
Use the “Get Data” feature to search for the available data sources and choose “Power BI Dataflows”
Everything else works the same way as consuming data into Power BI from any other source.
Operations performed through Data Flows:
- Data Movement
- Choosing Columns to be Included
- Column Transformations
- Data Type Remapping
- Adding Conditional Columns
- Adding Calculated Columns
- Schedule Refresh of Data Flows
- Incremental Refresh of Data Flows (Premium Only feature)
- Link Entities
What is the actual reason behind the concept of Power BI Data Flows?
Data flows create a backend directory on to the Azure Data Lake Storage (Gen 2) with the actual copy of the data. This data can be connected directly from desktop and cloud applications for further analysis, transformations and advanced insights. Below are the few Cloud locations from where the data copied by the Data Flows can be used.
Watch out this space for more exciting blogs which follows Microsoft’s footsteps in achieving a complete BI supremacy.