Create External Table
This component lets users to create an "external" table that references externally stored data. This creates a table that references the data that is held externally, meaning the table itself does not hold the data. External tables can be queried but are read-only. To learn more, read Working with External Tables.
External tables require an external stage to stage data. External stages can be managed via the Manage Stages menu.
Referencing externally held data can be valuable when wanting to query large datasets without resorting to storing that same volume of data in Snowflake.
Data loaded in these tables takes the form of a single column ("VALUE") with a variant-type value with multiple properties. Each column of target data becomes a property within the variant on its respective row. Thus, the data will usually require some transformation to be in a desirable format. See also: Flatten Variant.
External Tables must be refreshed before use whenever the externally held data is changed using the Refresh External Table component. We recommend always refreshing an External Table before using it in a job.
Properties
Name
= string
A human-readable name for the component.
Create/Replace
= drop-down
- Create: Create the new table with the given name. Will fail if a table of that name already exists.
- Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case.
- Replace: Will create the new table, potentially overwriting any existing table of the same name.
Database
= drop-down
The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.
Schema
= drop-down
The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.
New Table Name
= string
The name of the external table to be created or used.
Partition Columns
= column editor
A partition is defined by an expression applied to a set of data, resulting in a partition that only contains rows that satisfy the expression. For example, partitioning a dataset by the value in the 'year' column.
For more information about setting up partitions with Snowflake's external tables, read Managing Regular Data Loads.
- Name: The name of the new partition column.
- Type: The data type of the partition column.
- Size:The data size of the new column values.
- Precision: The precision of the new column values.
- Expression: The expression used to partition the data.
Stage Database
= drop-down
Select a Snowflake database for the external stage.
Stage Schema
= drop-down
Select a Snowflake schema for the external stage.
Stage
= drop-down
Select an external stage for the data. Staging areas can be managed via the Manage Stages menu or created through Snowflake using the CREATE STAGE command.
Relative Path
= string
The directory path to follow to the target data. File names cannot be specified. The path is relative to the storage location given in the external stage setup.
Pattern
= string
Specify, using regular expression (RegEx) pattern syntax, files to be matched on the external stage.
For example: '.*flight.*[.]csv'
Format
= drop-down
Select a pre-made file format that will automatically set many of the component properties accordingly. These formats can be created through the Create File Format component.
The list of file formats will come from the database and schema that the specified tables exist in.
Warning
In versions of Matillion ETL prior to 1.76, jobs could use file formats that existed in the environment default schema. Existing jobs that used these formats will no longer work after an upgrade to 1.76 or later. To fix this, recreate those file formats in the same schemas as the tables.
File Type
= drop-down
The type of expected data to load. Some data may require additional formatting, explained in Preparing to Load Data.
Available options are: AVRO, CSV, JSON, ORC, PARQUET, and XML.
Component properties will change to reflect the choice made here and give options based on the specific file type.
Enable Octal
= boolean
Enables parsing of octal numbers.
Compression
= drop-down
Whether the input file is compressed in gzip format, BROTLI, BZ2, DEFLATE, RAW_DEFLATE, ZSTD, or not compressed at all.
Record Delimiter
= string
The delimiter to be used that separates records (rows) in the file. Defaults to newline.
\
can also signify a newline. \\r
can signify a carriage return.
Field Delimiter
= string
The delimiter that separates columns. The default is a comma. A [TAB] character can be specified as \
.
Skip Header
= integer
The number of rows at the top of the file to ignore. The default setting is 0.
Escape
= string
When this option is specified, the backslash character \
in input data is treated as an escape character.
Escape Unenclosed Field
= string
Single character string used as the escape character for unenclosed field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). If a character is specified in the Escape field, it will override this field.
Trim Space
= boolean
Removes trailing and leading whitespace from the input data.
Field Optionally Enclosed
= string
A character that is used to enclose strings. Can be a single quote '
or a double quote "
or NONE (default). Escape the character with an instance of the same character.
Null If
= string
This option replaces the specified string with NULL in the output table. Use this if your data has a particular representation of missing data.
Error On Column Count Mismatch
= boolean
Generate an error if the number of delimited columns in the input does not match that of the table. If false, extra columns are not loaded into the table and missing columns are recorded as NULL in the table.
Empty Field As Null
= boolean
If this is set, empty columns in the input file will become NULL.
Encoding Type
= drop-down
The type of encoding that has been applied to the data. The default setting is UTF-8.
Strip Outer Array
= boolean
Enable removal of outer square brackets from JSONs.
Strip Null Values
= boolean
Enables removal of NULL values.
Ignore UTF-8 Errors
= boolean
When true, replaces invalid UTF-8 sequences with the Unicode replacement character (U+FFFD), instead of throwing an error.
Preserve Space
= boolean
When parsing XML files, leading and trailing spaces in elements are preserved if set to true.
Strip Outer Element
= boolean
When parsing XML files, this will strip the outermost XML element, exposing the second-level elements as separate documents.
Disable Snowflake Data
= boolean
When parsing XML files, will disable recognition of Snowflake semi-structured data tags if set to true.
Disable Auto Convert
= boolean
When parsing XML files, enables conversion of numeric and Boolean values from text to their native types if set to true.
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
✅ | ❌ | ❌ | ❌ | ❌ |