Skip to content

Azure Blob Storage Unload

This component creates files on a specified Azure Blob Storage account and loads them with data from a table or view. By default, your data will be unloaded in parallel.


Properties

Name = string

A human-readable name for the component.


Stage = drop-down

Select a staging area for the data. Staging areas can be created through Snowflake using the CREATE STAGE command. Internal stages can be setup this way to store staged data within Snowflake.

Selecting [Custom] will avail the user of properties to specify a custom staging area on Azure Blob Storage


Azure Storage Location = string

Select an Azure blob storage container that files will be unloaded into from the tree structure. Alternatively, provide the full URL manually in the field where the template has already been provided: azure://<account>/<container>/<path>

When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.


File Prefix = string

Specify a file prefix for unloaded data on the blob container. Each file will be named as the prefix followed by a number denoting which node this was unloaded from. All unloads are parallel, and will use the maximum number of nodes available at the time.


Authentication = drop-down

Select the authentication method. Users can choose either:

  • Credentials: Uses Azure security credentials (configured in the Matillion ETL environment).
  • Storage Integration: Use a Snowflake storage integration. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of permitted or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure). More information can be found at CREATE STORAGE INTEGRATION.

Warehouse = drop-down

The Snowflake warehouse used to run the queries. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.


Database = drop-down

The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


Target Table = drop-down

Select the table or view to unload to blob storage.


Format = drop-down

Choose from preset file formats available in your Snowflake database. Additional file formats can be created using the Create File Format component. Selecting the [Custom] file format will use the component properties to define the file format.


File Type = drop-down

Choose whether you would like Matillion ETL to unload the data in a CSV, JSON, or PARQUET file format.


Compression = drop-down

Select the compression format. Available CSV and JSON formats include:

  • AUTO
  • BROTLI
  • BZ2
  • DEFLATE
  • gzip
  • NONE (no compression)
  • RAW_DEFLATE
  • ZSTD

Available PARQUET formats include:

  • AUTO
  • LZO
  • NONE (no compression)
  • SNAPPY

Record Delimiter = string

(CSV only) Specify a delimiter character to separate records (rows) in the file. Defaults to newline. \ can also signify a newline. \\r can signify a carriage return.


Field Delimiter= string

(CSV only) Specify a delimiter character to separate columns. The default character is a comma.

A [TAB] character can be specified as \t.


Date Format = string

(CSV only) Defaults to auto. Use this property to manually specify a date format. More information...


Time Format = string

(CSV only) Defaults to auto. Use this property to manually specify a time format. More information...


Timestamp Format = string

(CSV only) Defaults to auto. Use this property to manually specify a timestamp format. More information...

Escape = string

(CSV only) Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.


Escape Unenclosed Field = string

(CSV only) Specify a single character to be used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). Default is \.

If a character is specified in the "Escape" field, it will override this field.

If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case it will be ignored.


Field Optionally Enclosed = string

A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.


Nest Columns = drop-down

(JSON only) When "True", the table columns will be nested into a single JSON object so that the file can be configured correctly. A table with a single variant column will not require this setting to be "True". The default setting is "False".


Null If = string

Specify one or more strings (one string per row of the table) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with a SQL NULL value for that field in the loaded table. Click + to add a string.


Trim Space = drop-down

Removes trailing and leading whitespace from the input data.


Overwrite = drop-down

When "True", overwrite existing data (if the target file already exists) instead of generating an error. Default setting is "False".


Single File = drop-down

When True, the unload operation will work in serial rather than parallel. This results in a slower unload but a single, complete file.

The default setting is False.

When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether the file is compressed).

When False, a filename prefix must be included in the path.


Max File Size = integer

The maximum size (in bytes) of each file generated.

The default is 16000000 (16 MB). The maximum size is 5000000000 (5 GB).

For more information, see the Snowflake documentation.


Include Headers = drop-down

When "True", write column names as headers at the top of the unloaded files. Default is "False".

Name = string

A human-readable name for the component.


Azure Storage Location = url

Your Azure Blob Storage location.


Credentials = drop-down

Choose which Azure credentials to use. Read manage credentials to learn more.


Catalog = drop-down

Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.


Database = drop-down

Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.


Table Name = drop-down

The table to unload to Azure Blob Storage.


File Type = select

Select the file type. Available types include AVRO, CSV, JSON, and PARQUET. Below properties will change to reflect the selected file type.


Compression = drop-down

Set the compression type. The default is NONE. Setting does not apply to AVRO.


Record Delimiter = string

(CSV only) A delimiter character used in delimited text formats. The default is a comma.


Quote = string

(CSV only) Sets the quote character for delimited text formats to enclose values containing the delimiter character. The default is ".


Date Format = string

(CSV & JSON only) Manually set a date format. If none is set, the default is yyyy-MM-dd.


Timestamp Format = string

(CSV & JSON only) Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].


Escape = string

(CSV only) Sets the escape character used in delimited text formats to escape special characters. The default is \.


Null Value = string

(CSV only) Sets the string representation of a null value in the input data. The default value is an empty string.


Header = boolean

(CSV only) Specify whether the input data has a header row. Default is "False".


Escape Quotes = boolean

(CSV only) Specify whether to escape quote characters. The default is "True".


Copying files to an Azure Premium Storage blob

When copying files to an Azure Premium Storage blob, Matillion ETL may provide the following error:

Self-suppression not permitted.

This is because, unlike standard Azure Storage, Azure Premium Storage does not support block blobs, append blobs, files, tables, or queues. Premium Storage supports only page blobs that are incrementally sized.

A page blob is a collection of 512-byte pages that are optimised for random read and write operations. Thus, all writes must be 512-byte aligned and so any file that is not sized a multiple of 512 will fail to write.

For additional information about Azure Storage blobs, we recommend consulting the Microsoft Azure documentation.


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics