Skip to content

Cloud Storage Unload

The Cloud Storage Unload component writes files from a table into a specified Google Cloud Storage (GCS) bucket.

For Google BigQuery users: this component cannot be used to unload views to Google BigQuery. Users wanting to unload views should first create a table with that view's metadata using a Create Table component. Next, use a Table Input component to select your view, then connect it to a Table Output component to copy the data to the new table. Finally, use your new table in the Cloud Storage Unload component.

For Snowflake users: this component can be used to unload views to your Snowflake data warehouse.


Properties

Name = string

A human-readable name for the component.


Stage = drop-down

Select your stage. To learn more about stages in Matillion ETL, read Manage Stages.


Google Storage URL Location = string

Input, or select via the file path tree, the URL of the Google Storage bucket from which to retrieve files. This follows a format such as gs://<bucket-name>/<path>.

When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.


File Prefix = string

A prefix added to files during the unload. The default prefix is data


Storage Integration = drop-down

Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read Storage Integration Setup Guide.

Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure cloud storage regardless of the cloud provider that hosts your Snowflake account.


Warehouse = drop-down

The Snowflake warehouse used to run the queries. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.


Database = drop-down

The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


Target Table = string

Choose a target table to unload data from into a GCS bucket.

Warning

This table will be recreated and will drop any existing table of the same name.


Format = drop-down

A named file format that describes a set of staged data to access or load into Snowflake tables. The default is [Custom].


File Type = drop-down

The file type for the file format. Available file types are: CSV, JSON, and PARQUET.

Component properties beneath this property will change depending on the chosen file type. Matillion ETL provides settings based on the specific file type.

For additional information on file type options, please refer to the Snowflake documentation.


Compression = drop-down

Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO. For more information, please refer to the relevant Snowflake documentation per your file type:


Nest Columns = drop-down

(JSON only) When "True", the table columns will be nested into a single JSON object so that the file can be configured correctly. A table with a single variant column will not require this setting to be "True". The default setting is "False".


Record Delimiter = string

(CSV only) Input a delimiter for records. This can be one or more single-byte or multibyte characters that separate records in an input file.

Note

  • Accepted values include: leaving the field empty; a newline character \ or its hex equivalent 0x0a; a carriage return \r or its hex equivalent 0x0d. Also accepts a value of NONE.
  • If you set the Skip Header to a value such as 1, then you should use a record delimiter that includes a line feed or carriage return, such as \ or \\r. Otherwise, your entire file will be interpreted as the header row, and no data will be loaded.
  • The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
  • Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
  • The default (if the field is left blank) is a newline character.
  • This delimiter is limited to a maximum of 20 characters.
  • While multi-character delimiters are supported, the record delimiter cannot be a substring of the field delimiter, and vice versa. For example, if the record delimiter is "aa", the field delimiter cannot be "aabb".

Field Delimiter = string

(CSV only) Input a delimiter for fields. This can be one or more singlebyte or multibyte characters that separate fields in a file.

Note

  • Accepted characters include common escape sequences, octal values (prefixed by \\), or hex values (prefixed by 0x). Also accepts a value of NONE.
  • This delimiter is limited to a maximum of 20 characters.
  • While multi-character delimiters are supported, the field delimiter cannot be a substring of the record delimiter, and vice versa. For example, if the field delimiter is "aa", the record delimiter cannot be "aabb".
  • The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
  • Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
  • Default setting is a comma (,).

Date Format = string

(CSV only) Specify the date format. Snowflake accepts dates in the most common forms, including YYYY-MM-DD and DD-MON-YYYY among others. Additionally, all accepted timestamps are valid inputs for dates.

Please refer to the Snowflake documentation for more information about date and time formats.


Time Format = string

(CSV only) Specify the time format. Snowflake supports a single TIME data type, for storing times in the form of HH:MI:SS. TIME supports an optional precision parameter for fractional seconds, e.g. TIME(3). Time precision ranges between 0 (seconds) to 9 (nanoseconds). The default precision is 9.

Please refer to the Snowflake documentation for more information about date and time formats.


Timestamp Format = string

(CSV only) Specify the timestamp format. Snowflake supports three variations of timestamp: TIMESTAMP_LTZ, TIMESTAMP_NTZ, and TIMESTAMP_TZ. To learn more, please consult the Snowflake documentation.


Escape = string

(CSV only) Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.


Escape Unenclosed Field = string

(CSV only) Specify a single character to be used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). Default is \.

If a character is specified in the "Escape" field, it will override this field.

If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case it will be ignored.


Field Optionally Enclosed = string

(CSV only) Specify a character used to enclose strings. The value can be NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape (''). Default is NONE.

When a field contains one of these characters, escape the field using the same character. For example, to escape a string like this: 1 "2" 3, use double quotation to escape, like this: 1 ""2"" 3.


Null If = string

Specify one or more strings (one string per row of the table) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with a SQL NULL value for that field in the loaded table. Click + to add a string.


Trim Space = drop-down

Removes trailing and leading whitespace from the input data.


Overwrite = drop-down

When "True", overwrite existing data (if the target file already exists) instead of generating an error. Default setting is "False".


Single File = drop-down

When True, the unload operation will work in serial rather than parallel. This results in a slower unload but a single, complete file.

The default setting is False.

When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether the file is compressed).

When False, a filename prefix must be included in the path.


Max File Size = integer

A number, greater than 0, that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread.

The actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.


Include Headers = drop-down

When "True", write column names as headers at the top of the unloaded files. Default is "False".

Name = string

A human-readable name for the component.


Project = drop-down

Select the Google Cloud project. The special value, [Environment Default], will use the project defined in the environment. For more information, read Creating and managing projects.


Dataset = drop-down

Select the Google BigQuery dataset to load data into. The special value, [Environment Default], will use the dataset defined in the environment. For more information, read Introduction to datasets.


Target Table = string

Select the table from which data will be unloaded to the GCS bucket.


Google Storage URL Location = string

Select the Google Cloud Storage bucket. Users can click through the file tree, or use the URL template: gs://<bucket>/<path>.

When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.


Output Object Name string

Specify a name for the output object (the object that will be created in the chosen GCS bucket).


Format = drop-down

Select the format of the data. Users can select one of: AVRO, CSV, JSON (New line delimited).


Include Header = drop-down

(CSV format only) Select "Yes" to add a header line to the top of each file that has a column name. The default setting is "Yes".


Compression = drop-down

(AVRO format only) Select the AVRO file format compression type. Options include: Deflate, Snappy, or no compression (None).

(CSV, JSON formats only) Select whether or not output files are to be compressed via gzip compression.


Delimiter = string

(CSV format only) Specify a delimiter character to separate columns. The default value is a comma ,.

A [TAB] character can be specified as "/t".


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics