Azure Synapse is the evolution of Azure SQL Data Warehouse that is expected to bridge the gap between data lakes and data warehouses. Azure Synapse focuses on integrating all the analytic capabilities into a single service. It brings Enterprise Data Warehousing and Big Data Analytics together. Azure Synapse is the industry’s first enterprise-class cloud data warehouse that can grow, shrink and pause in seconds. We can query huge amounts of information either serverless on-demand or with provisioned Azure resources. Serverless deployment is a type that automatically scales for storage and commute power.
How does it work
Synapse uses a node-based architecture. Synapse works on architecture that distributes compute power of data across multiple nodes. The compute and storage nodes are independent of each other and the number of compute nodes ranges from 1 to 60. The unit of compute power is called a Data Warehouse Unit (DWU).
It is evident from the architecture diagram that the control node is a single point which connects with all applications. It optimizes queries to run in parallel with its Massively Parallel Processing Engine thus passing the queries to compute nodes to run queries in parallel. Synapse uses the Data Movement Service, an internal service that automatically moves data across compute nodes.
One of the key features of Azure Synapse is the independent Compute and Storage facility that allows us to scale the compute power up or down without any data loss and also grants the pause option on the compute thus enabling us to pay only for the storage.
Some of the features of Azure Synapse that make it unique and welcoming are discussed here.
Azure Synapse storage uses various distribution techniques to optimize the performance of the system. In the Compute node, each query distributes the work into smaller queries that range from 1 to 60 to help them run in parallel. The three distributions are
Hash – Highest query performance for joins and aggregations on large tables
Round Robin – Simple and quick to create
Replicate – Fast query performance for small tables
Azure SQL Data Warehouse did not have an effective way of managing the workloads after creation.The workload management in Azure Synapse comes with the following concepts to enable us to have more control over how the workload utilizes system resources.
- Workload Classification
- Workload Isolation
- Workload Importance
The portal lets us monitor query activity and resource utilization in workload groups.
Azure Synapse comes with the following techniques to handle security.
- Firewall rules – Server-level IP firewall rules
- Connection Encryption
- Authentication – SQL Server and Azure Active Directory
- Authorization privileges – Using roles and permissions
- Data Protection – Dynamic masking and Transparent Data Masking
In addition to the above techniques, Azure Synapse has an advanced security option for highly sensitive data. Advanced data Security helps us discover and classify sensitive data. Advanced threat protection monitors the database for threats.
Analytics with Azure Synapse
Azure Synapse differs from all other cloud data warehouses in its unified approach of warehouse and analytics services. We can ingest, prepare, manage and serve data for immediate BI including machine learning needs with Azure Synapse. Azure Synapse has four components.
- Synapse SQL
- Synapse Pipelines
We have covered Synapse SQL which is generally available with Azure SQL Data Warehouse. Azure Synapse offers 85+ connectors to load data. In addition, Azure Synapse can have a Spark environment in Notebooks which is similar to Databricks where it supports multiple languages like Pyspark(Python), Spark(Scala), .NET Spark(C#) and Spark SQL.
The data can be orchestrated in Azure Synapse using Azure pipelines and transformation can be done with Data Flows in Azure Data Factory where operations like aggregation and joins can be done. In this way, Azure Synapse comes as a whole set of ELT tools along with additional analytics features including Machine Learning and visualization with Power BI.
With Azure Synapse Studio, Azure Synapse brings a single workspace for data professionals to work with their data.