Skip to content

Create Table

Create or replace a table.

Delta Lake on Databricks: Users may experience casting errors if using binary values.

Azure Synapse Analytics: When using the FLOAT data type, users can enter any value that is greater than or equal to 1 and less than or equal to 2,147,483,647. If a FLOAT with a size less than or equal to 7 is used, a column of type REAL will be produced in the database—note that this column will still appear as a FLOAT type within Matillion ETL, simply with a smaller size. Conversely, if a FLOAT with a size greater than or equal to 8 is used, the traditional FLOAT value is used for the column both in the database and Matillion ETL. To learn more, read float and real (Transact-SQL).

Warning

Depending on the chosen settings, this component can be destructive. Take care when running this component as it may remove existing data.


Properties

Name = string

A human-readable name for the component.


Create/Replace = drop-down

  • Create: Create the new table with the given name. Will fail if a table of that name already exists.
  • Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
  • Replace: Will create the new table, potentially overwriting any existing table of the same name.

Since other database objects depend upon this table, drop ... cascade is used, which may actually remove many other database objects.


Database = drop-down

The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


New Table Name = string

The name of the table to create or replace. This field is case-sensitive by default, since Matillion ETL uses quoted identifiers.


Table Type = drop-down

The type of Snowflake table to create:

  • Permanent: A table that holds data indefinitely and that can be restored using Snowflake's Time Travel feature.
  • Temporary: A table that is automatically destroyed at the end of the Snowflake session.
  • Transient: A table that holds data indefinitely, but that cannot be restored.

Columns = column editor

  • Column Name: The name of each new column.
  • Data Type: The data type of each column.
    • VARCHAR: Can hold any kind of data, subject to a maximum size.
    • NUMBER: Can hold whole-number types (no decimals). You can enter AUTOINCREMENT or IDENTITY as the default value for any numeric-type column and the column will automatically give an incremented value on each new row starting from 1. This feature is useful when creating a unique key on the table.
    • FLOAT: Can hold numeric types, with or without decimals.
    • BOOLEAN: Can hold data that is either true or false.
    • DATE: Can hold dates without times.
    • TIMESTAMP: Can hold timestamps.
    • TIME: Can hold times.
    • VARIANT: A flexible type that can be used for any purpose.
  • Size: For text types, this is the maximum length. This is a limit on the number of bytes, not characters. Generally speaking, in all data stored using UTF-8, any non-ASCII character will count as 2 or more bytes.
  • Precision: Relevant only for numeric data, it is the maximum number of digits that may appear to the right of the decimal point.
  • Default Value: The default value under this column for any row.
  • Not Null: True if this column does not accept null values.
  • Unique: Mark this column as unique, declaring that all values in it are unique with respect to one another. You may have multiple unique columns per table.
  • Comment: A location to store descriptive VARCHAR comments against columns. Used to contextualize the content being stored in tables in your database.
  • Masking Policy: Add a dynamic data masking policy to a column. When you add a new column, the default masking policy is blank and must be set if required.

Default DDL Collation = string

Set the default DDL collation. Setting this parameter forces all subsequently created columns in the affected table to have the specified collation as the default, unless the collation for the column is explicitly defined in the DDL.


Primary Keys = drop-down

Declare a column to be a primary key. Primary keys are, by default, unique. Users can only have one primary key per table.


Clustering Keys = dual listbox

Specify clustering keys to use. Defining clustering keys can allow for more efficient sorting and scanning of large tables.


Data Retention Time in Days = integer

Number of days for which data is retained after deletion. To learn more, read Understanding & Using Time Travel.


Comment = string

Attach a comment to the table.

Name = string

A human-readable name for the component.


Create Method = drop-down

  • Create: Create the new table with the given name. Will fail if a table of that name already exists.
  • Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
  • Replace: Will create the new table, potentially overwriting any existing table of the same name. Selecting this option guarantees that after the Orchestration Job has run, the table matches the schema defined in this component. However, any data from the existing table (that is, the table to be replaced upon the new job run) will be lost. This method is restricted to tables of file type Delta.

Catalog = drop-down

Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.


Database = drop-down

Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.


Table Name = string

The name of the table to create or replace.


Table Metadata = column editor

  • Column Name: The column name from the input flow. Add as many rows to the editor as you need, one per input column.
  • Data Type: Select from INTEGER, NUMBER, FLOAT, TEXT, TIMESTAMP, DATE, BOOLEAN, BINARY as the data type for this column.
  • Size: Set the data type size.
  • Scale: Set the data type scale.
  • Comment: A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.

Table Type = drop-down

Select whether the new table will be of type Managed or Unmanaged. The default setting is Managed. Read Managed table for more information.


Using = drop-down

Select the file type. Available types: CSV, Delta, JSON, ORC, Parquet. This property is only available when Table Type is set to Unmanaged.


Location = string

(AWS only) Specify the S3 bucket location. This property is only available when Table Type is set to Unmanaged.


Storage Account = drop-down

(Azure only) Select an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview. This property is only available when Table Type is set to Unmanaged.


Blob Container = string

(Azure only) A Blob Storage location. The available blob containers will depend on the selected storage account. This property is only available when Table Type is set to Unmanaged.


Partition Keys = dual listbox

Specify any columns to be used as partition keys.


Table Properties = column editor

Specify any table metadata using key=value pairs.

Key: Specify the key. Value: Specify the key's value.


Comment = string

Attach a comment to the table.

Name = string

A human-readable name for the component.


Schema = drop-down

Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, read Schemas.


New Table Name = string

The name of the table to create or replace.


Create/Replace = drop-down

  • Create: Create the new table with the given name. Will fail if a table of that name already exists.
  • Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
  • Replace: Will create the new table, potentially overwriting any existing table of the same name.

Since other database objects depend upon this table, drop ... cascade is used, which may actually remove many other database objects.


Table Metadata = column editor

  • Column Name: The name of the new column.
  • Data Type: The data type of each column. For more information, read Data types.
    • Text: Can hold any type of data, subject to a maximum size.
    • Integer: Suitable for whole-number types (no decimals).
    • Numeric: Suitable for numeric types, with or without decimals.
    • Real: Suitable for data of a single precision floating-point number.
    • Double Precision: Suitable for data of a double precision floating-point number.
    • SUPER: Use the SUPER data type to store semi-structured data or documents as values.
    • Boolean: Suitable for data that is either true or false.
    • Date: Suitable for dates without times.
    • DateTime: Suitable for dates, times, or timestamps (both date and time).
  • Size: For text types, this is the maximum length. This is a limit on the number of bytes, not characters. With Redshift, since all data is stored using UTF-8, any non-ASCII character will count as 2 or more bytes. For numeric types, this is the total number of digits allowed, whether before or after the decimal point.
  • Decimal Places: Relevant only for numeric data, it is the maximum number of digits that may appear to the right of the decimal point.
  • Encoding Type: The Redshift compression encoding. Read Compression encodings for details of the various available encodings. Although users can manually specify the encoding type here, it is advised to use automatic compression analysis to select the optimal compression. When loading into empty tables, this is performed by the S3 Load (unless you disable COMPUPDATE). It can also be performed by the Table Output, provided you are truncating the table.
  • Allow Nullable: When "True", Matillion ETL specifies that the column accepts null values. When "False", Matillion ETL specifies that the column is not allowed to contain null values. Default is "True".
  • Comment: A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.

Distribution Style = drop-down

  • All: Copy rows to all nodes in the Redshift cluster.
  • Auto: (Default) Allow Redshift to manage your distribution style.
  • Even: Distribute rows around the Redshift cluster evenly.
  • Key: Distribute rows around the Redshift cluster according to the value of a key column.

Note

Table distribution is critical to good performance. Read the Distribution styles documentation for more information.


Sort Key = dual listbox

This is optional, and lets users specify one or more columns from the input that should be set as the table's sort key.

Note

Sort keys are critical to good performance. Read Working with sort keys for more information.


Sort Key Options = drop-down

Decide whether the sort key is of a compound or interleaved variety.


Primary Keys = dual listbox

Select one or more columns to be designated as the table's primary key.


Identity Columns = column editor

  • Column Name: The name of the column. It must match the name of a column defined in the Table Metadata property, which will be set as an identity column. An identity column contains automatically generated values. The specified column must be of data type "Integer" or "Numeric" with zero (0) decimal places.
  • Seed: Specifies the starting value.
  • Step: Specifies the increment between values.

Backup Table = drop-down

Specify whether the created table is to be included in automated and manual cluster snapshots. "No" has no effect on automatic replication of data to other nodes within the cluster, meaning that tables set with "No" in this property are restored in a node failure.

The default setting is "Yes".

Name = string

A human-readable name for the component.


Project = drop-down

Select the Google Cloud project. The special value, [Environment Default], will use the project defined in the environment. For more information, read Creating and managing projects.


Dataset = drop-down

Select the Google BigQuery dataset to load data into. The special value, [Environment Default], will use the dataset defined in the environment. For more information, read Introduction to datasets.


New Table Name = string

The name of the table to create or replace.


Table Metadata = column editor

  • Field Name: The name of the new field (column).
  • Data Type: Set the field data type.
    • String: Can hold any kind of data, subject to a maximum size.
    • Integer: Suitable for whole-number types (no decimals).
    • Float: Suitable for numeric types, with or without decimals.
    • Numeric: Suitable for data of an exact numeric value, allowing 38 digits of precision and 9 decimal digits of scale.
    • Boolean: Suitable for data that is either true or false.
    • Date: A formatted date object without time.
    • Time: A formatted time object without date.
    • DateTime: A formatted timestamp containing both date and time that is easily readable by the user.
    • Timestamp: A timestamp left unformatted (exists as Unix/Epoch Time).
  • Mode: The field mode. Default is NULLABLE.
    • NULLABLE: Field allows null values.
    • REQUIRED: Field does not accept null values.
    • REPEATED: Field can accept multiple values.
  • Comment: A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.
  • Define Nested Metadata: When the Define Nested Metadata checkbox is ticked inside the 'Table Metadata' property, a tree-structure can be defined for metadata.

Create/Replace = drop-down

  • Create: Create the new table with the given name. Will fail if a table of that name already exists.
  • Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
  • Replace: Will create the new table, potentially overwriting any existing table of the same name.

Since other database objects depend upon this table, drop ... cascade is used, which may actually remove many other database objects.


Partitioning = drop-down

Choose whether to create a date-partitioned table or not.

  • Column: Partitions the table's data according to the specification in the Partitioning Field property, activated when this option is selected.
  • Day: Selecting Day will create the table as a partition table, which will load data into separate date-partitions and can be queried with the _PARTITIONTIME pseudo-column. Select Day when your data is spread out over a wide range of dates or if your data is continually added over time. This allows large datasets to be split into multiple parts with each part corresponding to a single day of data.
  • Hour: Select Hour if your tables have a high volume of data spanning a short date range (typically less than six months of timestamp values). Hourly partitioning allows the addressing of data at hour-level granularity, such as when appending, truncating, or deleting data from a particular partition.
  • Month: Select Month if your tables have a relatively small amount of data for each day, but span a wide date range.
  • None: No partitioning.
  • Year: Select Year if your tables have a relatively small amount of data for each day, but span a wide date range. Both Year and Month are recommended options if your workflow requires frequently updating or adding rows that span a wide date range (for example, more than a few hundred dates).

For more information, we recommend section "Daily partitioning versus hourly, monthly, or yearly partitioning" from Introduction to partitioned tables.

Enabling partitioning on a table will activate additional properties in the Table Input component, which allows the partition column to be included in the input.


Partitioning Field = drop-down

This parameter appears when "Column" is selected in the Partitioning property. Partitioning Field has a drop-down that should be populated only by Date or Timestamp column types defined in the table metadata. For more information, read Creating column partitions.


Partition Expiration Time = integer

Specify the number of milliseconds before the partition expires (field can be left blank for no expiration).


KMS Encryption = drop-down

Support for customer-managed encryption. By default, this is set to "No". Your encryption keys are stored within Cloud KMS. For more information, read Customer managed keys.


Location = drop-down

Choose a location that matches the location of the BigQuery dataset.


Key Ring = drop-down

Choose from a group of keys. On the GCP console in KMS, a key ring or multiple key rings will have been created.


Key = drop-down

Choose the encryption key from the key ring.


Cluster Columns = drop-down

Creates a clustered table. Allows the optional selection of up to four columns. Columns of types Integer, String, Boolean, Date, and Timestamp are supported. For more information, read Clustered tables.

Name = string

A human-readable name for the component.


Create/Replace = drop-down

  • Create: Create the new table with the given name. Will fail if a table of that name already exists.
  • Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
  • Replace: Will create the new table, potentially overwriting any existing table of the same name.

Since other database objects depend upon this table, drop ... cascade is used, which may actually remove many other database objects.


Schema = drop-down

Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on schemas, read the Azure Synapse documentation.


New Table Name = string

The name of the table to create or replace.

This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. For more information, read CREATE TABLE.


Table Metadata = column editor

  • Column Name: The name of the new column.
  • Data Type: The data type of each column.
    • DATETIME Suitable for timestamps.
    • TIME Suitable for times.
    • INTEGER Suitable for whole number types (no decimals).
    • NUMERIC Suitable for numeric types, with or without decimals.
    • TEXT Suitable for text types.
    • FLOAT Suitable for approximate number data types for use with floating point numeric data.
    • BOOLEAN Suitable for data where values are either true or false.
  • Size: The size column sets the size, except for the data types BOOLEAN, DATE, DATETIME, and TIME.
    • For more information about DATE, read date (Transact-SQL).
    • For DATETIME, the size column sets the precision of the datetime offset, and accepted values range from 0-7, ranging from 0 to 9999999, that represent the fractional seconds.
      • If the DATETIME size is set to 0, the returned datetime will be 26 positions (YYYY-MM-DD hh:mm:ss {+|-}hh:mm).
      • If the DATETIME size is set to a precision of 1, the returned datetime will be 28 positions (26 + precision + 1) (YYYY-MM-DD hh:mm:ss.n {+|-}hh:mm).
      • If the DATETIME size is set to a precision of 7, the returned datetime will be 34 positions (26 + precision + 1) (YYYY-MM-DD hh:mm:ss.nnnnnnn {+|-}hh:mm).
      • For more information, read datetime (Transact-SQL).
    • For TIME, the size column sets the precision of the time offset, and accepted values range from 0-7, ranging from 0 to 9999999, that represent the fractional seconds.
      • If the TIME size is set to 0, the returned time will be 8 positions (hh:mm:ss).
      • If the TIME size is set to 1, the returned time will be 10 positions (8 + precision + 1) (hh:mm:ss.n).
      • If the TIME size is set to 7, the returned time will be 16 positions (8 + precision + 1) (hh:mm:ss.nnnnnnn).
  • Scale: Define the scale.
  • Allow Nullable: Select whether to allow nullable values.

Distribution Style = drop-down

Select the distribution style

  • Hash: This setting assigns each row to one distribution by hashing the value stored in the distribution_column_name. The algorithm is deterministic, meaning it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL, because all rows that have NULL are assigned to the same distribution.
  • Replicate: This setting stores one copy of the table on each Compute node. For SQL Data Warehouse, the table is stored on a distribution database on each Compute node. For Parallel Data Warehouse, the table is stored in an SQL Server filegroup that spans the Compute node. This behavior is the default for Parallel Data Warehouse.
  • Round Robin: Distributes the rows evenly in a round-robin fashion. This is the default behaviour.

To learn more, read Guidance for designing distributed tables using dedicated SQL pool in Azure Synapse Analytics.


Distribution Column = drop-down

Select the column to act as the distribution column. This property is only available when the Distribution Style property is set to "Hash".


Index Type = drop-down

Select the table indexing type. Options include:

  • Clustered: A clustered index may outperform a clustered columnstore table when a single row needs to be retrieved quickly. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clustered index column. Choosing this option prompts the Index Column Grid property.
  • Clustered Column Store: This is the default setting. Clustered columnstore tables offer both the highest level of data compression and the best overall query performance, especially for large tables. Choosing this option prompts the Index Column Order property.
  • Heap: Users may find that using a heap table is faster for temporarily landing data in Synapse SQL pool. This is because loads to heaps are faster than to index tables, and in some cases, the subsequent read can be done from cache. When a user is loading data only to stage it before running additional transformations, loading the table to a heap table is much faster than loading the data to a clustered columnstore table.

Index Column Grid = column editor

  • Name: The name of each column.
  • Sort: Assign a sort orientation of either ascending (Asc) or descending (Desc).

Index Column Order = dual listbox

Select the columns in the order to be indexed.


Partition Key = drop-down

Select the table's partition key. Table partitions determine how rows are grouped and stored within a distribution. For more information about table partitions, read Partitioning tables in dedicated SQL pool.


Variable Exports

This component makes the following values available to export into variables:

Source Description
Table Recreated Whether or not the table was (re)created. This is useful when Create/replace is set to Create if not exists so users can discern whether the table needed to be created or not.

Strategy

Generates any required DDL to manage an existing table of the same name, followed by a CREATE TABLE statement.


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics