Skip to content

Google Cloud Storage Load

The Google Cloud Storage Load orchestration component lets users load data stored on the Google Cloud Storage service into an existing Snowflake table.

This component requires working Google Cloud Storage credentials with "read" access to the source data files.


Properties

Name = string

A human-readable name for the component.


Stage = drop-down

Select a staging area for the data. Staging areas can be created through Snowflake using the CREATE STAGE command. Internal stages can be set up this way to store staged data within Snowflake. Selecting [Custom] will allow the user to specify a custom staging area.


Storage Integration = drop-down

Select the storage integration. Storage integrations are required to permit Snowflake to read from and write to a cloud storage location. Integrations must be set up in advance and configured to support Google Cloud Storage.


Google Storage URL Location = file explorer

To retrieve the intended files, use the file explorer to enter the container path where the Google Cloud Storage bucket is located, or select from the list of GCS buckets.

This must have the format GS://<bucket>/<path>.


Pattern = string

A regular expression pattern string that specifies the file names and/or paths to match. For more information on pattern matching, read the Snowflake documentation.


Warehouse = drop-down

The Snowflake warehouse used to run the queries. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.


Database = drop-down

The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


Target Table = string

Select an existing table to load data into. The tables available for selection depend on the chosen schema.


Load Columns = dual listbox

Choose the columns to load. If you leave this parameter empty, all columns will be loaded.


Format = drop-down

Select a pre-made file format that will automatically set many of the Google Cloud Storage Load component properties. These formats can be created through the Create File Format component. Select [Custom] to specify a custom format using the properties available in this component.


File Type = drop-down

Select the type of data to load. Available data types are: AVRO, CSV, JSON, ORC, PARQUET, and XML. For additional information on file type options, read the Snowflake documentation.

Component properties will change to reflect the selected file type. Click one of the tabs below for properties applicable to that file type.

Compression = drop-down

Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO.


Trim Space = boolean

When Yes, removes whitespace from fields. Default setting is No.


Null If = editor

Specify one or more strings (one string per row in the dialog) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with an SQL NULL value for that field in the loaded table. Click + to add a string.

Compression = drop-down

Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO.


Record Delimiter = string

Input a delimiter for records. This can be one or more single-byte or multibyte characters that separate records in an input file.

Accepted values include: leaving the field empty; a newline character \ or its hex equivalent 0x0a; a carriage return \\r or its hex equivalent 0x0d. Also accepts a value of NONE.

If you set the Skip Header property to a value such as 1, then you should use a record delimiter that includes a line feed or carriage return, such as \ or \\r. Otherwise, your entire file will be interpreted as the header row, and no data will be loaded.

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.

Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.

If the field is left blank, the default record delimiter is a newline character.


Field Delimiter = string

Input a delimiter for fields. This can be one or more single-byte or multibyte characters that separate fields in an input file.

Accepted characters include common escape sequences, octal values (prefixed by \), or hex values (prefixed by 0x). Also accepts a value of NONE.

This delimiter is limited to a maximum of 20 characters.

While multi-character delimiters are supported, the field delimiter cannot be a substring of the record delimiter, and vice versa. For example, if the field delimiter is "aa", the record delimiter cannot be "aabb".

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.

Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.

The Default setting is a comma: ,.


Skip Header = integer

Specify the number of rows to skip. The default is 0.

If Skip Header is used, the value of the record delimiter will not be used to determine where the header line is. Instead, the specified number of CRLF will be skipped. For example, if the value of Skip Header = 1, skips to the first CRLF that it finds. If you have set the Field Delimiter property to be a single character without a CRLF, then skips to the end of the file (treating the entire file as a header).


Skip Blank Lines = boolean

When Yes, ignores blank lines that only contain a line feed in a data file and does not try to load them. Default setting is No.


Date Format = string

Define the format of date values in the data files to be loaded. If a value is not specified or is AUTO, the value for the DATE_INPUT_FORMAT session parameter is used. The default setting is AUTO.


Time Format = string

Define the format of time values in the data files to be loaded. If a value is not specified or is AUTO, the value for the TIME_INPUT_FORMAT session parameter is used. The default setting is AUTO.


Timestamp Format = string

Define the format of timestamp values in the data files to be loaded. If a value is not specified or is AUTO, the value for the TIMESTAMP_INPUT_FORMAT session parameter is used.


Escape = string

Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.


Escape Unenclosed Field = string

Specify a single character to be used as the escape character for unenclosed field values only. Default is \\. If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case, it will be ignored.


Trim Space = boolean

When Yes, removes whitespace from fields. Default setting is No.


Field Optionally Enclosed = string

Specify a character used to enclose strings. The value can be NONE, single quote character ', or double quote character ". To use the single quote character, use the octal or hex representation 0x27 or the double single-quoted escape ''. Default is NONE.

When a field contains one of these characters, escape the field using the same character. For example, to escape a string like this: 1 "2" 3, use double quotation to escape, like this: 1 ""2"" 3.


Null If = editor

Specify one or more strings (one string per row in the dialog) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with an SQL NULL value for that field in the loaded table. Click + to add a string.


Error On Column Count Mismatch = boolean

When Yes, generates an error if the number of delimited columns in an input file does not match the number of columns in the corresponding table. When No (default), an error is not generated and the load continues. If the file is successfully loaded in this case:

  • Where the input file contains records with more fields than columns in the table, the matching fields are loaded in order of occurrence in the file, and the remaining fields are not loaded.
  • Where the input file contains records with fewer fields than columns in the table, the non-matching columns in the table are loaded with NULL values.

Empty Field As Null = boolean

When Yes, inserts NULL values for empty fields in an input file. This is the default setting.


Replace Invalid Characters = boolean

Snowflake replaces invalid UTF-8 characters with the Unicode replacement character. When False (default), the load operation produces an error when invalid UTF-8 character encoding is detected.


Encoding Type = drop-down

Select the string that specifies the character set of the source data when loading data into a table. Refer to the Snowflake documentation for more information.

Compression = drop-down

Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO.


Trim Space = boolean

When Yes, removes whitespace from fields. Default setting is No.


Null If = editor

Specify one or more strings (one string per row in the dialog) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with an SQL NULL value for that field in the loaded table. Click + to add a string.


Enable Octal = boolean

When Yes, enables the parsing of octal values. Default setting is No.


Allow Duplicates = boolean

When Yes, allows duplicate object field names. Default setting is No.


Strip Outer Array = boolean

When Yes, instructs the JSON parser to remove outer brackets. Default setting is No.


Strip Null Values = boolean

When Yes, instructs the JSON parser to remove any object fields or array elements containing NULL values. Default setting is No.


Ignore UTF8 Errors = boolean

When Yes, replaces any invalid UTF-8 sequences with Unicode characters. When No (default), UTF-8 errors will not produce an error in the pipeline run.

Trim Space = boolean

When Yes, removes whitespace from fields. Default setting is No.


Null If = editor

Specify one or more strings (one string per row in the dialog) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with an SQL NULL value for that field in the loaded table. Click + to add a string.

Compression = drop-down

Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO.


Trim Space = boolean

When Yes, removes whitespace from fields. Default setting is No.


Null If = editor

Specify one or more strings (one string per row in the dialog) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with an SQL NULL value for that field in the loaded table. Click + to add a string.

Compression = drop-down

Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO.


Ignore UTF8 Errors = boolean

When Yes, replaces any invalid UTF-8 sequences with Unicode characters. When No (default), UTF-8 errors will not produce an error in the pipeline run.


Preserve Space = boolean

When Yes, the XML parser preserves leading and trailing spaces in element content. Default setting is No.


Strip Outer Element = boolean

When Yes, the XML parser strips out any outer XML elements, exposing second-level elements as separate documents. Default setting is No.


Disable Snowflake Data = boolean

When Yes, the XML parser will not recognise Snowflake semi-structured data tags. Default setting is No.


Disable Auto Convert = boolean

When Yes, the XML parser will disable automatic conversion of numeric and Boolean values from text to native representations. Default setting is No.


On Error = drop-down

Decide how to proceed upon an error.

  • Abort Statement: Aborts the load if any error is encountered.
  • Continue: Continue loading the file.
  • Skip File: Skip file if any errors are encountered in the file.
  • Skip File When n Errors: Skip file when the number of errors in the file is equal to or greater than the specified number in the next property, n.
  • Skip File When n% Errors: Skip file when the percentage of errors in the file exceeds the specified percentage of n.

Default setting is Abort Statement.


n = integer

Specify the number of errors or the percentage of errors required for the load to skip the file. Only used when On Error is set to Skip File When n Errors or Skip File When n% Errors.

This property only accepts integer characters. Specify percentages as a number only, without the % symbol.


Size Limit (B) = integer

Specify the maximum size, in bytes, of data to be loaded for a given COPY statement. If the maximum is exceeded, the COPY operation discontinues loading files. For more information, read the Snowflake documentation.


Purge Files = boolean

Select Yes to purge data files after the data is successfully loaded. Default setting is No.


Truncate Columns = boolean

  • Yes: The component will automatically truncate strings to the target column length.
  • No: The COPY statement produces an error if a loaded string exceeds the target column length.

Default setting is No.


Force Load = boolean

Select Yes to load all files, regardless of whether they have been loaded previously and haven't changed since they were loaded. This option reloads files and can lead to duplicated data in a table.

Default setting is No.


Metadata Fields = dual listbox

Snowflake metadata columns available to include in the load.

Snowflake automatically generates metadata for files in internal stages (i.e. Snowflake) and external stages (Google Cloud Storage, Microsoft Azure, or Amazon S3). This metadata is "stored" in virtual columns. These metadata columns are added to the staged data, but are only added to the table when included in a query of the table. For more information, read Querying Metadata for Staged Files.


Snowflake Databricks Amazon Redshift