Snowflake is a data warehouse built exclusively for the cloud. Unlike traditional shared-disk and shared-nothing architecture, Snowflake has a multi-cluster shared data architecture that is faster, easier to use and highly scalable. In this blog, we are going to cover the various data ingestion techniques in Snowflake.
Data Ingestion using Web Interface
The straightforward approach to do data ingestion into snowflake is through the Snowflake Web Interface. However, the wizard supports loading only a small number of files of limited size (up to 50MB).
Once you have created the table, click on the table to open the table details page and click on the Load Table option. When you select the Load Table option, the Load Data wizard opens which will load the file into your table.
1. Select the desired warehouse that is intended for data loading purposes.
2. Once you have selected the warehouse, you can either choose to load files from the local machine or if you have Amazon S3 storage in your landscape, you can choose to load it from there.
3. Select a file format for your data files from the dropdown list. You can also create a new named file format.
4. Once the file format is chosen, specify the load options in case an error occurs. Finally, click the Load button. Snowflake then loads the data file into the specified table via the selected warehouse.
Loading data from the internal stage using the COPY command
To load a bulk dataset from our local machine to Snowflake, SnowSQL should be used to upload these data files to one of the following Snowflake stages:
- Named Internal Stage
- Internal Stage for the specified table
- Internal Stage for the current user
Once files are staged, the data in the files can be loaded into the table using the COPY INTO command. For using Snowsql, download and install the Snowsql from the Snowflake web user interface.
- Once installed, open the command line window and type Snowsql -v to check the version installed.
- Connect to SnowSQL by typing in your account name and username. You will be prompted for a password. Once given, it would be possible to connect to your Snowflake account.
- Once connected, set the database, schema, and warehouse used for the processing.
- Use the PUT command to load the data files from our local machine to the Snowflake stage (Internal, User or Table Stages). Here we have created a named staged “INTERNAL_STAGE” and have loaded the data file to that stage.
- Use the COPY command to populate the tables with the contents of the data file from the staging area.
Loading data files staged in Amazon S3, Microsoft Azure, etc to Snowflake
If you already have an Amazon Web Services (AWS) account or Microsoft Azure account in your landscape, then you can use S3 buckets or azure containers to store and manage the data files. You can bring these data files into Snowflake either by directly accessing the storage and then load into snowflake tables or by staging these data files in an external stage and access the external stage instead. You can create an external stage either via an interface or by building the stage in the worksheet. In the case of Amazon S3, you need to have an AWS key and a secret key to access the bucket and in case of Azure, Shared Access Signature (SAS) token needs to be generated to access the Azure containerCREATE STAGE “SF_TRYOUTS”.”COE_TRYOUTS”.S3_STAGE S3 URL = ‘s3://test’ CREDENTIALS = (AWS_KEY_ID = ‘raghavia’ AWS_SECRET_KEY = ‘********’);Copy the data files into your table using COPY command by,
- Accessing the S3 bucket directly, COPY INTO MYTABLE FROM s3://test CREDENTIALS= (AWS_KEY_ID =’raghavia’ AWS_SECRET_KEY =’********’) FILE_FORMAT = (FORMAT_NAME = my_csv_format);
- Accessing the external stage created, COPY INTO MYTABLE FROM @S3_STAGE FILE_FORMAT = (FORMAT_NAME = my_csv_format); PATTERN=’.*.csv’;
Loading data from various data sources into Snowflake
Having different data sources across the organization with different requirements can pose to be a challenge. Data integration involves combining data from different sources and enabling users to query and manipulate data from a single interface and derive analytics and statistics. Snowflake can operate with a variety of Data Integration tools such as Alooma, Fivetran, Stitch, Matillion, etc.