In the previous article, the purpose of VORA and its core features was explained. In this series, VORA’s new engines; Time series, Graph, and Document store engines are going to be discussed.

Time Series Engines

VORA’s time series engine handles data with timestamps such as sensor or transaction based data (e.g. IoT, web log, clickstream data). With a traditional database, processing time series data can be difficult due to its volume and the velocity of the data. These data types require proper compression techniques (Fig 1), partition schemes, and granularization support. With the proper compression technique, data points that represent the trend or data points that exceed the error percentage, can only be recorded to save storage space instead of recording all data points.

SAP HANA VORA - New Engines Use Cases - 1

Compression Techniques

Here is an example use case that demonstrates aggregation and column functions for sales data used in Walmart’s sales forecasting. In this dataset, sales made by different departments and other measures such as temperature, fuel price, CPI (Consumer Price Index) and unemployment rate were recorded weekly. The dataset was first inserted into HDFS then loaded to VORA’s time series engine (Fig. 2).

 

SAP HANA VORA - New Engines Use Cases - 2

Scripts to Load the Time Series Data Set

Once loaded, analysis can be done by simply calling built-in functions. Queries can be executed in VORA tools, Spark shell or Zeppelin. Table 1 shows the function commands and the outputs in Zeppelin.

Table 1

FunctionSQL
TrendTrend function fits a linear regression line for the data points between the specified period and returns the slope of the regression. Positive value indicates the increasing sales trend. SAP HANA VORA - New Engines Use Cases - 3
MedianAggregation functions (sum, average, median, mode, min, max, count) provide the summary for each column. SAP HANA VORA - New Engines Use Cases - 4
HistogramCreates a histogram of the selected column with the specified number of bins. SAP HANA VORA - New Engines Use Cases - 5
GranulizeModifies intervals between timestamps and returns the new series (e.g from 7 days to 1 month). SAP HANA VORA - New Engines Use Cases - 6

Graph Engine

For a linked data set that has a large volume, storing the data in a graph engine can increase the query efficiency.  Graph models can handle one-to-many relationships in a simpler way by converting foreign keys in RDBMS to relations between nodes. Thus, it decreases the complexity for joining tables. Also, like the document-store engine, the graph engine does not require knowing the data type to insert data (schema free), which makes it suitable for adding new data with different structures in real-time.

SAP HANA VORA - New Engines Use Cases - 7

Relational Model vs Graph Model

To insert the dataset to the VORA’s graph engine, data should be in the JSG file extension. The JSG file type is a line-based JSON format that consists of following elements:

SAP HANA VORA - New Engines Use Cases - 8

JSG File Format Example

Common graph functions such as calculating the shortest path between two nodes (directed, undirected), finding node degrees (in, out, sum), connected components (strong, weak) and graph pattern matching are available. Example SQL commands and the outputs using the movie data are available here. Figure 5 summarizes how VORA graph engine is different from HANA graph engine.

SAP HANA VORA - New Engines Use Cases - 9

Comparing HANA Graph vs VORA Graph Engines

Document Store Engine

Just like MongoDB or CouchDB, VORA’s document store engine can take document-oriented data. Document oriented data sets are semi-structured, meaning that documents (analog to a row in RDBMS) in the same collection (analog to a table in RDBMS) can have different fields. In a relational database, all rows in one table must have the same structure and the data types of each attribute need to be assigned before importing the data. However, with the document store engine, there is no pre-defined table schema to import data. Figure 6 shows an example of two documents that can be stored in one collection. These two documents do not have the same fields or data types. Adding these two records to one table is not possible in an RDBMS due to the enforced table schema.

SAP HANA VORA - New Engines Use Cases - 10

Example Documents in JSON Format

The query syntax is same as the syntax used in the relational engine except that queries need to be wrapped inside of “   “ (back quotes). Tables in the document store engine can be joined with tables in other engines (Fig. 7).

SAP HANA VORA - New Engines Use Cases - 11

Data Modeler in VORA Tools showing Tables from All Engines

Additionally, VORA has a disk engine, which saves data to disk instead of memory. Tables saved in VORA’s disk engine can be directly accessed in SAP HANA using a HANA Wire Connector. VORA’s connections to other SAP products will be covered separately in another blog post.

To summarize, VORA provides diverse engines in one platform where you can combine various data sources. Hence, it reduces complications for setting up individual tools and enables easy incorporation of processed data to SAP HANA.

SAP HANA VORA - New Engines Use Cases - 12

In the next series, machine learning applications using SAP PAL and SparkML will be compared in detail, along with how VORA can help this process.

Want to know more? Click here to get in touch.

 

5600 Tennyson Pkwy
Suite 120
Plano TX 75024.

+1 888-227-2794

+1 972-232-2233

+1 888-227-7192

solutions@visualbi.com