S3 Unload
S3 Unload creates files on a specified S3 bucket, and loads the bucket with data from a table or view.
To access an S3 bucket from a different AWS account, read Background: Cross-account permissions and using IAM roles.
Note
If you're using a Matillion Full SaaS solution, you may need to allow these IP address ranges from which Matillion Full SaaS agents will call out to their source systems or to cloud data platforms.
Properties
Name
= string
A human-readable name for the component.
Stage
= drop-down
Select a staging area for the data. Staging areas can be created through Snowflake using the CREATE STAGE command. Internal stages can be set up this way to store staged data within Snowflake. Selecting [Custom] will avail the user of properties to specify a custom staging area on S3. Users can add a fully qualified stage by typing the stage name. This should follow the format databaseName.schemaName.stageName
Authentication
= drop-down
Select the authentication method. Users can choose either:
- Credentials: Uses AWS security credentials.
- Storage Integration: Use a Snowflake storage integration. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of permitted or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage). More information can be found at CREATE STORAGE INTEGRATION.
Storage Integration
= drop-down
Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them. Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage, regardless of the cloud provider that hosts your Snowflake account.
S3 Object Prefix
= file explorer
To retrieve the intended files, use the file explorer to enter the container path where the S3 bucket is located, or select from the list of S3 buckets.
This must have the format S3://<bucket>/<path>
.
File Prefix
= string
Filename prefix for unloaded data to be named on the S3 bucket. Each file will be named as the prefix followed by a number denoting which node this was unloaded from. All unloads are parallel and will use the maximum number of nodes available at the time.
Encryption
= drop-down
Decide how the files are encrypted inside the S3 bucket. This property is available when using an existing Amazon S3 location for staging.
- None: No encryption.
- Client Side Encryption: Encrypt the data according to a client-side master key. Read Protecting data using client-side encryption to learn more.
- SSE KMS: Encrypt the data according to a key stored on KMS. Read AWS Key Management Service (AWS KMS) to learn more.
- SSE S3: Encrypt the data according to a key stored on an S3 bucket. Read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3) to learn more.
KMS Key ID
= drop-down
The ID of the KMS encryption key you have chosen to use in the Encryption property.
Only available when encryption is set to KMS Encryption.
Master Key
= drop-down
The secret definition denoting your master key for client-side encryption. Your password should be saved as a secret definition before using this component.
Only available when encryption is set to Client Side Encryption.
Warehouse
= drop-down
The Snowflake warehouse used to run the queries. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.
Database
= drop-down
The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.
Schema
= drop-down
The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.
Target Table
= drop-down
Select an existing table. The tables available for selection depend on the chosen schema.
Format
= drop-down
Select a pre-made file format that will automatically set many of the S3 Load component properties. These formats can be created through the Create File Format component.
File Type
= drop-down
Available data types are: CSV, JSON, or PARQUET. Some file types may require additional formatting—this is explained in the Snowflake documentation. Component properties will change to reflect the selected file type.
Compression
= drop-down
Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO.
Nest Columns
= drop-down
JSON only. When True, the table columns should be nested into a single JSON object so that the file can be configured correctly. A table with a single variant column will not require this setting to be True. Default is False.
Record Delimiter
= string
CSV only. Input a delimiter for records. This can be one or more single-byte or multibyte characters that separate records in an input file.
Accepted values include: leaving the field empty; a newline character \
or its hex equivalent 0x0a
; a carriage return \\r
or its hex equivalent 0x0d
. Also accepts a value of NONE.
If you set the Skip Header to a value such as 1, then you should use a record delimiter that includes a line feed or carriage return, such as \
or \\r
. Otherwise, your entire file will be interpreted as the header row, and no data will be loaded.
The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
The default (if the field is left blank) is a newline character.
Field Delimiter
= string
CSV only. Input a delimiter for fields. This can be one or more single-byte or multibyte characters that separate fields in an input file.
Accepted characters include common escape sequences, octal values (prefixed by \), or hex values (prefixed by 0x). Also accepts a value of NONE.
This delimiter is limited to a maximum of 20 characters.
While multi-character delimiters are supported, the field delimiter cannot be a substring of the record delimiter, and vice versa. For example, if the field delimiter is "aa", the record delimiter cannot be "aabb".
The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
The Default setting is a comma: ,
.
Date Format
= string
CSV only. Define the format of date values in the data files to be loaded. If a value is not specified or is AUTO, the value for the DATE_INPUT_FORMAT session parameter is used. The default setting is AUTO.
Time Format
= string
CSV only. Define the format of time values in the data files to be loaded. If a value is not specified or is AUTO, the value for the TIME_INPUT_FORMAT session parameter is used. The default setting is AUTO.
Timestamp Format
= string
CSV only. Define the format of timestamp values in the data files to be loaded. If a value is not specified or is AUTO, the value for the TIMESTAMP_INPUT_FORMAT session parameter is used.
Escape
= string
CSV only. Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.
Escape Unenclosed Field
= string
CSV only. Specify a single character to be used as the escape character for unenclosed field values only. Default is \\
. If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case, it will be ignored.
Field Optionally Enclosed
= string
CSV only. Specify a character used to enclose strings. The value can be NONE, single quote character '
, or double quote character "
. To use the single quote character, use the octal or hex representation 0x27
or the double single-quoted escape ''
. Default is NONE.
When a field contains one of these characters, escape the field using the same character. For example, to escape a string like this: 1 "2" 3, use double quotation to escape, like this: 1 ""2"" 3.
Null If
= editor
Specify a string to convert to SQL NULL values.
Overwrite
= drop-down
If the target file already exists, overwrite data instead of generating an error.
Single File
= boolean
When True, the unload will work in serial rather than parallel. This results in a slower unload but a single, complete file. The default setting is False.
When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether or not the file is compressed). When False, a filename prefix must be included in the path.
Max File Size
= integer
The maximum size (in bytes) of each file generated, per thread. Default is 16000000 bytes (16 MB) and Snowflake has a 6.2GB file limit for copy-into-location operations. Files that exceed the stated maximum will be split into multiple size-abiding parts.
Include Headers
= boolean
When true, write column names as headers at the top of the unloaded files.
Snowflake | Databricks | Amazon Redshift |
---|---|---|
✅ | ❌ | ❌ |