This is part of the SAP Lumira Discovery blog series.

A very common need for enterprise data consumers is the ability to merge data from multiple sources. An example would be the need to validate sales data against budget data stored in a spreadsheet (or another database).

In SAP Lumira Discovery, you can combine data from different sources in three different ways.

  1. Append: Addition of new rows to the existing data set
  2. Merge: Adding new columns to the existing dataset
  3. Blending: Linking data which makes business sense

 

1. Append Data

Appending two datasets is similar to performing a union between them. The mandatory prerequisite for appending one dataset to another is that both the datasets must have equal number of columns of similar data types.

Combine Multiple Data Sources in SAP Lumira Discovery - 1

For example, let us assume that the sales orders for 2011-13 is present in dataset A and for year 2014 is present in a separate dataset B.

To perform the append operation, we now need to do the following:

Step 1.1 – Adding the data sources:

Open SAP Lumira Discovery and add the data sets by clicking the below highlighted icon

Combine Multiple Data Sources in SAP Lumira Discovery - 2.1

Step 1.2 – Appending the data sources:

Click on the ‘Append Datasets’ icon to perform the append operation. In this process, let us append the dataset B to dataset A.

Combine Multiple Data Sources in SAP Lumira Discovery - 3.1

Step 1.3 – Mapping the fields:

Now map the respective fields that correspond to each other.

Combine Multiple Data Sources in SAP Lumira Discovery - 5.1

Note that the number of fields in both the data sources remains the same before and after the append operation. Only the number of records vary.

2. Merging Data

While Append adds the records of data set B to data set A, Merge adds columns from dataset B to dataset A. The prerequisite to have a common key column of the same data type.

Merging can be done in two different ways:

Combine Multiple Data Sources in SAP Lumira Discovery - 10

Inner Join: In this method, the resulting dataset will only retain records with the common key.

Combine Multiple Data Sources in SAP Lumira Discovery - 11

Left Outer Join: In this method, all the records from the first dataset would be retained. For records that do not have a matching row in dataset B, null values would be filled in the new columns.

The result of the two types of merge operations are outlined below.

Combine Multiple Data Sources in SAP Lumira Discovery - 8.1

Let us consider the following datasets outlining Co2 emissions for 2011 and 2015.

Dataset A – for year 2015

combine multiple data sources - 1

Dataset B – for year 2011

combine multiple data sources - 2

The result of the two types of merge operations are outlined below.

combine multiple data sources - 3

3. Data Blending

Data blending can be done when there is no common key but having a common column (e.g. Country) using which the data can be blended.

Let us assume two data sets – one set having sales actuals at state (and thus the country) level, and another dataset having the sales budget – but ONLY at the country level.

In such cases, blending is recommended over doing a merge.

Reason: If you were to merge the datasets, the budget would be repeated for every record at the state level, and aggregation would add up repetitive budget values for the same country. At the same time, blending would look up the budget data after the first data set has been aggregated. The concept of blending is also known as a post-aggregate join – meaning that the lookup happens after aggregation – and is a very powerful feature of Self Service BI tools.

Some key differences between data merging & data blending are as follows:

  • Data Merging happens at the data level, while Data Blending happens at the UI/visualization level; blending is done in a specific scenario to solve a specific question.
  • Blending usually happens when the two data sets are of different granularity
  • Blending performs the role of an excel VLOOKUP whereas merging performs the function of a SQL Join

Let us see when we should use data blending instead of data merging with an example.

When the second data set does not have a primary key to lookup, the merge is not possible. In the case below, Manager is the common dimension between the datasets.

combine multiple data sources - 4

In this case, we have to use data blending. The linked data in this case will contain the correct values for net value when compared to that of merged data.

combine multiple data sources - 5

The type of data linking can be changed as shown below:

  • Maximize the visualization in which linked data is used.
  • Right click on blend link in the “Datasets in use panel”.
  • You can change the type of join which includes Left Outer, Inner, Full Outer and Exception.

combine multiple data sources - 6

Full outer join is not available while merging. As these joins will apply in the visualization level and not in dataset level, there will not be any performance issue even when we use Full Outer join.

There are eventually three approaches we can take based upon the final storyboard,

  • Live Connection: HANA and BW have live connection options and limited combinations would be possible only with their respective live sources.
  • Imported Data: If the data is imported (SAP, non-SAP, flat files), we can perform all the combinations across the data sources.
  • Independent Data: This approach is when no combination is needed between the data and all visualizations on the storyboard will have individual sources

Interested to learn how to leverage SAP Lumira for your enterprise (or) train your business users in SAP Lumira Discovery? Reach out to us today.

Subscribe to our Newsletter

5920 Windhaven Pkwy
Plano TX 75093.

+1 888-227-2794

+1 972-232-2233

+1 888-227-7192

solutions@visualbi.com