CSV Load
CSV Load is an orchestration component that allows you to load data into a table from one or more CSV files from a source location. It automatically infers the schema from the structure of the selected file(s), creates the corresponding table, and loads the data into it.
Properties
Name
= string
A human-readable name for the component.
Connect
Type
=drop-down
- Snowflake Managed: Load your file from a Snowflake internal stage.
- S3: Load your file from an S3 bucket.
- Google Cloud Storage: Load your file from a Google Cloud Storage bucket.
- Azure Blob Storage: Load your file from an Azure Blob Storage container.
Stage
= drop-down
Select a staging area for the data. The special value [New Stage] will create a temporary stage to be used for loading the data when the corresponding parameter values are provided.
The options shown in this drop-down menu depend on the values selected for the Database
and Schema
parameters. If these parameter values change, the available options here will update accordingly, and the previously selected option may no longer be valid.
Staged File Path
= string (optional)
A stage may include directories. The user has the option to browse and select files from a specific directory path, for example, /Example/Path
.
If this is left blank, and if the Pattern parameter is also empty, all files from the stage will be loaded.
S3 Object Prefix
= file explorer
To retrieve the intended files, use the file explorer to enter the container path where the S3 bucket is located, or select from the list of S3 buckets.
This must have the format S3://<bucket>/<path>
.
Encryption
= drop-down
Decide how the files are encrypted inside the S3 bucket. This property is available when using an existing Amazon S3 location for staging.
- None: No encryption.
- Client Side Encryption: Encrypt the data according to a client-side master key. For more information, read Protecting data using client-side encryption.
- SSE KMS: Encrypt the data according to a key stored on KMS. For more information, read AWS Key Management Service (AWS KMS).
- SSE S3: Encrypt the data according to a key stored on an S3 bucket. For more information, read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3).
KMS Key ID
= drop-down
The ID of the KMS encryption key you have chosen to use in the Encryption
property.
Only available when Encryption
is set to SSE KMS.
Master Key
= drop-down
The secret definition denoting your master key for client-side encryption. Your password should be saved as a secret definition before using this component.
Only available when Encryption
is set to Client Side Encryption.
Storage Integration
= drop-down
Select the storage integration. Storage integrations are required to permit Snowflake to read from and write to a cloud storage location. Integrations must be set up in advance and configured to support Google Cloud Storage. For more information, read Create storage integration.
Google Storage URL Location
= file explorer
To retrieve the intended files, use the file explorer to enter the container path where the Google Cloud Storage bucket is located, or select from the list of Google Cloud Storage buckets.
This must have the format gs://<bucket>/<path>
.
Azure Storage Location
= file explorer
To retrieve the intended files, use the file explorer to enter the container path where the Azure Storage account is located, or select from the list of storage accounts.
This must have the format azure://<STORAGEACCOUNT>/<path>
.
Pattern
= string (optional)
A string used to partially match file paths and names to include in the load.
By default, the value is set to *.csv
, which matches all CSV files in the specified Staged File Path or in the Stage parameters. The pattern is applied to the full file path, not just relative to the directory defined in Staged File Path.
The subfolder containing the object to load must be included here.
Note
- Any pattern entered assumes the file extension
.csv
. - Only
.csv
files are supported. Ensure the files you select are valid CSV files.
Configure
File Format
= drop-down
The default value is set to [New File Format]
. Specify a file format, and a temporary format with those settings will be used when the component runs. Alternatively, select a pre-made file format.
Compression
= drop-down
Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE.
The default setting is AUTO.
Record Delimiter
= string (optional)
Input a delimiter for records. This can be one or more single-byte or multibyte characters that separate records in an input file.
Accepted values include: leaving the field empty; a newline character \
or its hex equivalent 0x0a
; a carriage return \\r
or its hex equivalent 0x0d
. Also accepts a value of NONE.
If you set the Skip Header to a value such as 1, then you should use a record delimiter that includes a line feed or carriage return, such as \
or \\r
. Otherwise, your entire file will be interpreted as the header row, and no data will be loaded.
The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
The default (if the field is left blank) is a newline character.
Field Delimiter
= string (optional)
Input a delimiter for fields. This can be one or more single-byte or multibyte characters that separate fields in an input file.
Accepted characters include common escape sequences, octal values (prefixed by \), or hex values (prefixed by 0x). Also accepts a value of NONE.
This delimiter is limited to a maximum of 20 characters.
While multi-character delimiters are supported, the field delimiter cannot be a substring of the record delimiter, and vice versa. For example, if the field delimiter is "aa", the record delimiter cannot be "aabb".
The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
The Default setting is a comma: ,
.
Parse Header
= boolean
Specify whether the first row header should be used to define the column names of the resulting table. For this option to work, the Match By Column Name parameter must be set to either CASE_SENSITIVE
or CASE_INSENSITIVE
.
The Skip Header option isn’t supported if this parameter is set to true.
The default setting is false.
Skip Header
= integer
Specify the number of rows to skip. The default is 0.
If Skip Header is used, the value of the record delimiter will not be used to determine where the header line is. Instead, the specified number of CRLF will be skipped. For example, if the value of Skip Header = 1, skips to the first CRLF that it finds. If you have set the Field Delimiter property to be a single character without a CRLF, then skips to the end of the file (treating the entire file as a header).
Skip Blank Lines
= boolean
When True, ignores blank lines that only contain a line feed in a data file and does not try to load them. Default setting is False.
Escape
= string (optional)
Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.
Escape Unenclosed Field
= string (optional)
Specify a single character to be used as the escape character for unenclosed field values only. Default is \\
. If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case, it will be ignored.
Trim Space
= boolean
When True, removes whitespace from fields. Default setting is False.
Field Optionally Enclosed
= string (optional)
Specify a character used to enclose strings. The value can be NONE, single quote character '
, or double quote character "
. To use the single quote character, use the octal or hex representation 0x27
or the double single-quoted escape ''
. Default is NONE.
When a field contains one of these characters, escape the field using the same character. For example, to escape a string like this: 1 "2" 3, use double quotation to escape, like this: 1 ""2"" 3.
Null If
= editor (optional)
Specify one or more strings (one string per row of the table) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with an SQL NULL value for that field in the loaded table. Click + to add a string.
Error On Column Count Mismatch
= boolean
When True, generates an error if the number of delimited columns in an input file does not match the number of columns in the corresponding table. When False (default), an error is not generated and the load continues. If the file is successfully loaded in this case:
- Where the input file contains records with more fields than columns in the table, the matching fields are loaded in order of occurrence in the file, and the remaining fields are not loaded.
- Where the input file contains records with fewer fields than columns in the table, the non-matching columns in the table are loaded with NULL values.
Empty Field As Null
= boolean
When True, inserts NULL values for empty fields in an input file. This is the default setting.
Replace Invalid Characters
= boolean
Snowflake replaces invalid UTF-8 characters with the Unicode replacement character. When False (default), the load operation produces an error when invalid UTF-8 character encoding is detected.
Encoding Type
= drop-down
Select the string that specifies the character set of the source data when loading data into a table. Refer to the Snowflake documentation for more information.
Destination
Warehouse
= drop-down
The Snowflake warehouse used to run the queries. The special value [Environment Default]
uses the warehouse defined in the environment. Read Overview of Warehouses to learn more.
Database
= drop-down
The Snowflake database. The special value [Environment Default]
uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.
Schema
= drop-down
The Snowflake schema. The special value [Environment Default]
uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.
Target Table Name
= string
Select an existing table to load data into. The tables available for selection depend on the chosen schema.
Load Strategy
= string
- Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
- We need to ensure copy_grants have been added when replacing the table.
- Truncate and Insert: If the specified table name already exists, all rows within the table will be removed and new rows will be inserted per the next run of this pipeline.
- Fail if Exists: If the specified table name already exists, this pipeline will fail to run, and no data will be copied to the table.
- Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It's appended onto the end of the existing data in the table. If the specified table name doesn't exist, then the table will be created, and your data will be inserted into the table. For example, if you have a source holding 100 records, then on the first pipeline run, your target table will be created and 100 rows will be inserted. On the second pipeline run, those same 100 records will be appended to your existing target table, so now it holds 200 records. Third pipeline run will be 300 records in your table, and so on.
Advanced Settings
On Error
= drop-down
Decide how to proceed upon an error.
- Abort Statement: Aborts the load if any error is encountered. This is the default setting.
- Continue: Continue loading the file.
- Skip File: Skip file if any errors are encountered in the file.
- Skip File When n Errors: Skip file when the number of errors in the file is equal to or greater than the specified number in the next property, n.
- Skip File When n% Errors: Skip file when the percentage of errors in the file exceeds the specified percentage of n.
n
= integer
Specify the number of errors or the percentage of errors required to skip the file. This parameter only accepts integer characters. %
is not accepted. Specify percentages as a number only.
Size Limit (in bytes)
= integer (optional)
Specify the maximum size, in bytes, of data to be loaded for a given COPY statement. If the maximum is exceeded, the COPY operation discontinues loading files. For more information, refer to the Snowflake documentation.
Purge Files
= boolean
When True, purges data files after the data is successfully loaded. Default setting is False.
Match By Column Name
= drop-down
Specify whether to load semi-structured data into columns in the target table that match corresponding columns represented in the data.
- Case Insensitive: Load semi-structured data into columns in the target table that match corresponding columns represented in the data. Column names should be case-insensitive.
- Case Sensitive: Load semi-structured data into columns in the target table that match corresponding columns represented in the data. Column names should be case-sensitive.
- None: The COPY operation loads the semi-structured data into a variant column or, if a query is included in the COPY statement, transforms the data.
Truncate Columns
= boolean
When True, strings are automatically truncated to the target column length. When False (default), the COPY statement produces an error if a loaded string exceeds the target column length.
Force Load
= boolean
When True, loads all files, regardless of whether they have been loaded previously and haven't changed since they were loaded. Default setting is False.
When set to True, this option reloads files and can lead to duplicated data in a table.
Snowflake | Databricks | Amazon Redshift |
---|---|---|
✅ | ❌ | ❌ |
Got feedback or spotted something we can improve?
We'd love to hear from you. Join the conversation in the Documentation forum!