Create External Table
Create a table that references data stored in an S3 bucket. This creates a table that references the data that is held externally, meaning the table itself does not hold the data. It is important that the Matillion ETL instance has access to the chosen external data source.
Referencing externally held data can be valuable when wanting to query large datasets without resorting to storing that same volume of data on the Redshift cluster.
External tables are part of Amazon Redshift Spectrum and may not be available in all regions. For a list of supported regions, read Amazon Redshift endpoints and quotas.
For more information about working with external tables, read Creating external tables for Redshift Spectrum.
Note
Nested data loads from JSON or Parquet file formats may also be set up using this component via the Define Nested Metadata checkbox in the Table Metadata property.
Properties
Name
= string
A human-readable name for the component.
Schema
= drop-down
Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, read Schemas.
New Table Name
= string
The name of the external table to be created or used.
Create/Replace
= drop-down
- Create: Create the new table with the given name. Will fail if a table of that name already exists.
- Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
- Replace: Will create the new table, potentially overwriting any existing table of the same name.
Since other database objects depend upon this table, drop ... cascade
is used, which may actually remove many other database objects.
Table Metadata
= column editor
- Column Name: The name of the new column
- Data Type: Select the data type.
- Text: Can hold any type of data, subject to a maximum size.
- Integer: Suitable for whole-number types (no decimals).
- Numeric: Suitable for numeric types, with or without decimals.
- Boolean: Suitable for data that is either true or false.
- Date: Suitable for dates without times.
- DateTime: Suitable for dates, times, or timestamps (both date and time).
- Size: For text types, this is the maximum length. This is a limit on the number of bytes, not characters. For Amazon Redshift, since all data is stored using UTF-8, any non-ASCII character will count as 2 or more bytes. For numeric types, this is the total number of digits allowed, whether before or after the decimal point.
- Decimal Places: Relevant only for numerics, it is the maximum number of digits that may appear to the right of the decimal point.
- Define Nested Metadata: When the Define Nested Metadata checkbox is ticked inside the Table Metadata property, a tree structure can be defined for metadata.
Partition
= dual listbox
Assign one or more columns in this table as potential partitions.
Partition columns allow queries on large datasets to be optimized when that query is made against the columns chosen as partition columns. When a partition is created, values for that column become distinct Amazon S3 storage locations, allowing rows of data in a location that is dependent on their partition column value.
For example, it is common for a date column to be chosen as a partition column, thus storing all other data according to the date it belongs to. When creating partitioned data using the Add Partition component, it is vital that those partitioned columns have already been marked using this property.
Location
= drop-down
The Amazon S3 bucket location for the external table data. The Matillion ETL instance must have access to this data (typically, access is granted according to the AWS credentials on the instance, or if the bucket is public).
Format
= drop-down
Choose a format for the source file.
Field Terminator
= string
(TEXTFILE only) The delimiter to be used that separates fields (columns) in the file. Defaults to \\A
.
Line Terminator
= string
(TEXTFILE only) The delimiter to be used that separates records (rows) in the file. Defaults to newline.
\
can also signify a newline.
\\r
can signify a carriage return.
Skip Header Rows
= integer
The number of rows at the top of the file to skip. The default setting is an empty field.
Strip Outer Array
= drop-down
(JSON only) Strips the outer array from the JSON file, enabling JSON files that contain a single, anonymous array to be loaded without error. The default setting is No.
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
❌ | ❌ | ✅ | ❌ | ❌ |