S3 Unload
Creates files on a specified S3 bucket, and load them with data from a table or view.
For Snowflake users: by default, your data will be unloaded in parallel.
For Amazon Redshift users: your data will be unloaded in parallel by default, creating separate files for each slice on your cluster.
For Amazon Redshift users: this component is similar in effect to the Text Output component. Since S3 Unload unloads data in parallel directly from Amazon Redshift to S3, it tends to be faster than using Text Output. However, S3 Unload sacrifices some of the added functionality that comes from Text Output pulling the data through the Matillion ETL instance (such as adding column headers to each file).
To access an S3 bucket from a different AWS account, the following is required:
- Set up cross-account access via AWS roles.
- The user must type in the bucket they want to access or use a variable to load/unload to those structures.
Properties
Name
= string
A human-readable name for the component.
Stage
= drop-down
Choose a predefined stage for your data. These stages must be created from your Snowflake account console. Otherwise, "Custom" can be chosen for the staging to be based on the component's properties.
S3 Object Prefix
= string
The name of the file for data to be unloaded into.
When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
File Prefix
= string
Filename prefix for unloaded data to be named on the S3 bucket. Each file will be named as the prefix followed by a number denoting which node this was unloaded from. All unloads are parallel and will use the maximum number of nodes available at the time.
Encryption
= drop-down
Decide how the files are encrypted inside the S3 bucket. This property is available when using an existing Amazon S3 location for staging.
- None: No encryption.
- Client Side Encryption: Encrypt the data according to a client-side master key. Read Protecting data using client-side encryption to learn more.
- SSE KMS: Encrypt the data according to a key stored on KMS. Read AWS Key Management Service (AWS KMS) to learn more.
- SSE S3: Encrypt the data according to a key stored on an S3 bucket. Read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3) to learn more.
KMS Key ID
= drop-down
The ID of the KMS encryption key you have chosen to use in the Encryption property.
Master Key
= drop-down
The ID of the client-side encryption key you have chosen to use in the Encryption property.
Authentication
= drop-down
Select the authentication method. Users can choose either:
- Credentials: Uses AWS security credentials. Read Manage Credentials to learn more.
- Storage Integration: Uses a Snowflake storage integration. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of permitted or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure). More information can be found at CREATE STORAGE INTEGRATION.
Credentials
= drop-down
Select your AWS credentials. The special value, [Environment Default], uses the set of credentials specified in your Matillion ETL environment—this is the default value. Click Manage to edit or create new credentials in Manage Credentials.
Storage Integration
= drop-down
Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read Storage Integration Setup Guide.
Note
Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure cloud storage regardless of the cloud provider that hosts your Snowflake account.
Warehouse
= drop-down
The Snowflake warehouse used to run the queries. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.
Database
= drop-down
The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.
Schema
= drop-down
The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.
Table name
= string
The table or view to unload to S3.
Format
= drop-down
Choose from preset file formats available in your Snowflake database. Additional file formats can be created using the Create File Format component. Selecting the [Custom] file format will use the S3 Unload component's properties to define the file format.
File Type
= drop-down
Choose whether you would like Matillion ETL to unload the data in a CSV, JSON, or PARQUET format.
Compression
= drop-down
Whether the input file is compressed in gzip format, BROTLI, BZ2, DEFLATE, RAW_DEFLATE, ZSTD or not compressed at all.
Nest Columns
= drop-down
This parameter is only available when the File Type parameter is set to "JSON".
Specify whether or not ("True" or "False") the table columns should be nested into a single JSON object so that the file can be configured correctly. A table with a single variant column will not require this setting to be "True".
Default is "False"
Record Delimiter
= string
The delimiter to be used that separates records (rows) in the file. Defaults to newline. \ can also signify a newline. \r can signify a carriage return.
Field Delimiter
= string
The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as "\ ".
Date Format
= string
Defaults to "auto". This can be used to manually specify a date format.
Time Format
= string
Defaults to "auto". This can be used to manually specify a time format.
Timestamp Format
= string
Defaults to "auto". This can be used to manually specify a timestamp format.
Escape
= string
(CSV only) Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.
Escape Unenclosed Field
= string
(CSV only) Specify a single character to be used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). Default is \
.
If a character is specified in the "Escape" field, it will override this field.
If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case it will be ignored.
Field Optionally Enclosed
= string
A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). The character chosen can be escaped by that same character.
Note
This MUST be set when unloading data that contains NULL values into CSV format .
Null If
= string
Specify a string to convert to SQL NULL values. In this field, users can specify a value that will be used when unloading the data to the chosen file. If an SQL NULL value is found, then the NULL value is replaced with the first value listed in this. For unloads, only the first string that you specify in this property is used.
Allow Overwrites
= drop-down
If the target file already exists, overwrite data instead of generating an error.
Single File
= drop-down
When True, the unload will work in serial rather than parallel. This results in a slower unload but a single, complete file.
The default setting is False.
When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether or not the file is compressed).
When False, a filename prefix must be included in the path.
Max File Size
= string
The maximum size (in bytes) of each file generated, per thread. Default is 16000000 bytes (16 MB) and Snowflake has a 6.2GB file limit for copy-into-location operations. Files that exceed the stated maximum will be split into multiple size-abiding parts.
Include Headers
= drop-down
If set to "True", write column names as headers at the top of the unloaded files.
Name
= string
A human-readable name for the component.
Credentials
= drop-down
Choose which AWS credentials to use. Read Manage Credentials to learn more.
S3 Object Prefix
= url
The name of the file for data to be unloaded into. When a user enters a forward slash character /
after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
Encryption
= drop-down
Decide how the files are encrypted inside the S3 bucket. None: No encryption. Client Side Encryption: Encrypt the data according to a client-side master key. Read Protecting data using client-side encryption to learn more. SSE KMS: Encrypt the data according to a key stored on KMS. Read AWS Key Management Service (AWS KMS) to learn more. SSE S3: Encrypt the data according to a key stored on an S3 bucket. Read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3) to learn more.
KMS Key ID
= drop-down
The ID of the KMS encryption key you have chosen to use in the Encryption property.
Master Key
= drop-down
The ID of the client-side encryption key you have chosen to use in the Encryption property.
Catalog
= drop-down
Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
Database
= drop-down
Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
Table Name
= drop-down
The table to unload to Azure Blob Storage.
File Type
= drop-down
Select the file type. Available types include AVRO, CSV, JSON, and PARQUET. Below properties will change to reflect the selected file type.
Compression
= drop-down
Set the compression type. The default is NONE. Setting does not apply to AVRO.
Record Delimiter
= string
(CSV only) A delimiter character used in delimited text formats. The default is a comma.
Quote
= string
(CSV only) Sets the quote character for delimited text formats to enclose values containing the delimiter character. The default is "
.
Date Format
= string
(CSV & JSON only) Manually set a date format. If none is set, the default is yyyy-MM-dd
.
Timestamp Format
= string
(CSV & JSON only) Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX]
.
Escape
= string
(CSV only) Sets the escape character used in delimited text formats to escape special characters. The default is \
.
Null Value
= string
(CSV only) Sets the string representation of a null value in the input data. The default value is an empty string.
Header
= Boolean
(CSV only) Specify whether the input data has a header row. Default is "False".
Escape Quotes
= Boolean
(CSV only) Specify whether to escape quote characters. The default is "True".
Name
= string
A human-readable name for the component.
Schema
= drop-down
Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, read Schemas.
Note
An external schema is required if the Type property is set to External.
Table Name
= string
The table or view to unload to S3.
S3 URL Location
= string
The URL of the S3 bucket to load the data into.
Note
This component can unload to any accessible bucket, regardless of region.
When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
S3 Object Prefix
= string
Create data files in S3 beginning with this prefix. The format of the output is
<prefix><slice-number>_part_<file-number>
Where slice-number is the number of the slice in your cluster and file number (files larger than 6.2GB) will be split.
IAM Role ARN
= drop-down
Select an IAM role Amazon Resource Name (ARN) that is already attached to your Redshift cluster, and that has the necessary permissions to access S3.
This setting is optional, since without this style of setup, the credentials of the environment (instance credentials or manually entered access keys) will be used.
Read the Redshift documentation for more information about using a role ARN with Redshift.
Generate Manifest
= drop-down
Whether or not to generate a manifest file detailing the files that were added.
Note
Selecting the option Yes (Verbose) will create a manifest file that explicitly lists details for the data files created by the Unload process. For more information, please visit the Redshift documentation.
Data File Type
= drop-down
Choose the file type from: CSV, Delimited, Fixed Width, or Parquet.
Delimiter
= string
The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as "\ ".
Fixed Width Spec
= string
Loads the data from a file where each column width is a fixed length, rather than separated by a delimiter. Each column is described by a name and length, separated by a colon. Each described column is then separated by a comma.
e.g. We have four columns; name, id, age, state. These columns have the respective lengths; 12,8,2,2.
The written description to convert this data into a table using fixed-width columns would then be:
name:12,id:8,age:2,state:2_
Note that the columns can have any plaintext name. For more information on fixed width inputs, please consult the AWS documentation.
Compress Data
= drop-down
Whether or not the resultant files are to be compressed.
Compression Type
= drop-down
(If Compress Data is Yes) Select either gzip or Bzip2 as the compression method.
NULL As
= string
This option replaces the specified string with null in the output table. Use this is your data has a particular representation of missing data.
Escape
= drop-down
Whether or not to insert backslashes to escape special characters. This is often a good idea if you intend to re-load the data back into a table later, since the COPY also supports this option.
Allow Overwrites
= drop-down
If the target file already exists, overwrite data instead of generating an error.
Parallel
= drop-down
If set, the unload will work in parallel, creating multiple files (one for each slice of the cluster). Disabling parallel will result in a slower unload but a single, complete file.
Add quotes
= drop-down
If set, quotation marks are added to the data.
S3 Bucket Region
= drop-down
The Amazon S3 region hosting the S3 bucket. This is not normally required and can be left as "None" (default) if the bucket is in the same region as your Redshift cluster.
Max File Size
= string
The maximum size (in MB) of each file generated, per thread. Default is 16 MB and AWS has a 6.2GB file limit for Unload operations. Files that exceed the stated maximum will be split into multiple size-abiding parts.
Include Headers
= drop-down
If set to Yes, Matillion will write column names as headers at the top of unloaded files.
Note
This component property will not work in tandem with the Fixed Width component property.
Encryption
= drop-down
Decide how the files are encrypted inside the S3 bucket. This property is available when using an existing Amazon S3 location for staging.
- None: No encryption.
- Client Side Encryption: Encrypt the data according to a client-side master key. Read Protecting data using client-side encryption to learn more.
- SSE KMS: Encrypt the data according to a key stored on KMS. Read AWS Key Management Service (AWS KMS) to learn more.
- SSE S3: Encrypt the data according to a key stored on an S3 bucket. Read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3) to learn more.
KMS Key ID
= drop-down
The ID of the KMS encryption key you have chosen to use in the Encryption property.
Master Key
= drop-down
The ID of the client-side encryption key you have chosen to use in the Encryption property.
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
✅ | ✅ | ✅ | ❌ | ❌ |