Azure Blob Storage Load (Snowflake)
The Azure Blob Storage Load component lets users load data into an existing table from objects stored in Azure Blob Storage.
External Stages and Metadata
If using an external stage, please be aware of the following information:
- All columns to be included in the load must be included in the Load Columns property, in the correct order and associated with the correct data type. The Create Table component can be used to specify the metadata of the columns.
- The Metadata Fields property will insert metadata columns at the end of the existing table and overwrite the same number of columns unless additional columns are added via the Create Table component.
- The table's data and structure are only accessed at runtime, meaning that additional columns must be added (and data types set) before the job is run. To ensure these columns are set up beforehand, users can load their data with Matillion ETL's Azure Blob Load Generator.
Properties
Name
= string
A human-readable name for the component.
Stage
= drop-down
Select a staging area for the data. Staging areas can be created through Snowflake using the CREATE STAGE command. Internal stages can be setup this way to store staged data within Snowflake.
Selecting [Custom] will avail the user of properties to specify a custom staging area on Azure Blob Storage
Stage Authentication
= drop-down
Select an authentication method for data staging.
- Credentials: Uses the credentials configured in the environment. If no credentials have been configured, an error will occur.
- Storage Integration: Use a Snowflake storage integration to authentication data staging. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of allowed or blocked storage locations. To learn more, read Create Storage Integration.
Storage Integration
= drop-down
Select a Snowflake storage integration from the drop-down list. Storage integrations are required to permit Snowflake to read data from and write to your cloud storage location (Amazon S3, Azure Blob Storage, Google Cloud Storage) and must be set up in advance of selection. To learn more about setting up a storage integration for use in Matillion ETL, read Storage Integration Setup Guide. Only available when Stage Authentication is set to Storage Integration.
Note
Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure cloud storage regardless of the cloud provider that hosts your Snowflake account.
Azure Storage Location
= string
The Azure storage location (including file path) for any data to be loaded.
When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
Pattern
= string
A string that will partially match all filenames that are to be included in the load. Defaults to '.*' indicating all files within the Azure Storage Location.
Warehouse
= drop-down
The Snowflake warehouse used to run the queries. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.
Database
= drop-down
The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.
Schema
= drop-down
The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.
Table
= drop-down
Select an existing table to load data into.
Load Columns
dual listbox
Select which of the target table's columns to load. Move columns to the right using the arrow buttons to include them in the load. Columns on the left will be excluded from the load.
Format
= drop-down
Select a pre-made file format that will automatically set many of the properties accordingly. These formats can be created through the Create File Format component.
File Type
= drop-down
Select the expected file type that data will be loaded from. Supports AVRO, CSV, JSON, ORC, PARQUET and XML. Component properties will change to reflect the choice made here and give options based on the specific file type.
Compression
= drop-down
Whether the input file is compressed in gzip format, BROTLI, BZ2, DEFLATE, RAW_DEFLATE, ZSTD or not compressed at all.
Record Delimiter
= string
(CSV only) Input a delimiter for records. This can be one or more single-byte or multibyte characters that separate records in an input file.
Note
- Accepted values include: leaving the field empty; a newline character
\
or its hex equivalent0x0a
; a carriage return \r or its hex equivalent 0x0d. Also accepts a value of NONE. - If you set the Skip Header to a value such as 1, then you should use a record delimiter that includes a line feed or carriage return, such as
\
or\\r
. Otherwise, your entire file will be interpreted as the header row, and no data will be loaded. - The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
- Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
- The default (if the field is left blank) is a newline character.
Field Delimiter
= string
The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as \t
.
Skip Header
= integer
(CSV only) Specify the number of rows to skip. The default is 0.
If Skip Header is used, the value of the record delimiter will not be used to determine where the header line is. Instead, the specified number of CRLF will be skipped.
For example, if the value of Skip Header = 1, then Matillion ETL will skip to the first CRLF that it finds. If you have set the Field Delimiter property to be a single character without a CRLF, then Matillion ETL skips to the end of the file (treating the entire file as a header).
Date Format
= string
Defaults to "auto". This can be used to manually specify a date format.
Time Format
= string
Defaults to "auto". This can be used to manually specify a time format.
Timestamp Format
= string
Defaults to "auto". This can be used to manually specify a timestamp format.
Escape
= string
(CSV only) Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.
Escape Unenclosed Field
= string
(CSV only) Specify a single character to be used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). Default is \
.
If a character is specified in the "Escape" field, it will override this field.
If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case it will be ignored.
Trim Space
= drop-down
Removes trailing and leading whitespace from the input data.
Field Optionally Enclosed
= string
A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.
Null If
= string
Specify one or more strings (one string per row of the table) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with a SQL NULL value for that field in the loaded table. Click + to add a string.
Error On Column Count Mismatch
= drop-down
Generate an error if the number of delimited columns in the input does not match that of the table. If false, extra columns are not loaded into the table and missing columns are recorded as NULL in the table.
Empty Field As Null
= drop-down
If this is set, empty columns in the input file will become NULL.
Encoding Type
= drop-down
The type of encoding that has been applied to the data. This defaults to UTF-8.
Enable Octal
= drop-down
(JSON File type) Enables parsing of octal numbers.
Allow Duplicates
= drop-down
(JSON File type) Allows duplicate object field names (keeping only the last one used).
Strip Outer Array
= drop-down
(JSON File type) Enable removal of outer square brackets from JSONs.
Strip Null Values
= drop-down
(JSON File type) Enables removal of null values.
Ignore UTF8 Errors
= drop-down
(JSON/XML File type) When true, replaces invalid UTF-8 sequences with the unicode replacement character (U+FFFD), instead of throwing an error.
Preserve Space
= drop-down
(XML File Type) Leading and trailing spaces in elements are preserved if set to true.
Strip Outer Element
= drop-down
(XML File Type) This will strip the outermost XML element, exposing the 2nd level elements as separate documents.
Disable Snowflake Data
= drop-down
(XML File Type) Will disable recognition of Snowflake semi-structured data tags if set to true.
Disable Auto Convert
= drop-down
(XML File Type) Enables conversion of numeric and Boolean values from text to their native types if set to true.
On Error
= drop-down
Choose what to do when this component encounters an error.
Size Limit (B)
= string
The upper limit file size for any individual file loaded.
Purge Files
= drop-down
Choose whether to destroy the source file after loading its data.
Truncate Columns
= drop-down
If this is set, any instance of data in the input file that is too long to fit into the specified target column width will be truncated to fit instead of causing an error.
Force Load
= drop-down
When TRUE, will load all files into Snowflake even if that file has not changed since a prior load. If FALSE (default), the component will attempt to avoid such redundant loads.
Metadata Fields
= dual listbox
Snowflake metadata columns available to include in the load.
Snowflake automatically generates metadata for files in internal stages (i.e. Snowflake) and external stages (Amazon S3, Google Cloud Storage, or Microsoft Azure). This metadata is "stored" in virtual columns. These metadata columns are added to the staged data, but are only added to the table when included in a query of the table. For more information, read Querying Metadata for Staged Files.
This property is only available when an external stage is selected. To manage stages, click the Environments panel in the bottom-left, then right-click a Matillion ETL environment, and click Manage Stages. To learn more, read Manage Stages.
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
✅ | ❌ | ❌ | ❌ | ❌ |