Skip to content

Rewrite Table

Write the input data flow out to a new table.

Runtime errors may occur, for example if a data value overflows the maximum allowed size of a field.

Warning

The output table is overwritten each time the component is executed so do not use this component to output permanent data you do not want to overwrite.


Properties

Name = string

A human-readable name for the component.


Warehouse = drop-down

The Snowflake warehouse used to run the queries. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.


Database = drop-down

The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


Target Table = string

Provide a new table name.

Previous versions of this component prepended t_ to the target table name to help avoid clashing with existing tables; however, this is no longer the case.


Order By = column editor

  • Column: Select the column(s) to sort by.
  • Sort Order: Set the corresponding column to be ordered ascending or descending. The default sort order is ascending.

Name = string

A human-readable name for the component.


Catalog = drop-down

Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.


Database = drop-down

Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.


Table = drop-down

The name of the output table.


Partition Keys = dual listbox

Select any input columns to be used as partition keys.


Table Properties = column editor

  • Key: A metadata property within the table. These are expressed as key=value pairs.
  • Value: The value of the corresponding row's key.

Comment = string

A descriptive comment of the view.

Name = string

A human-readable name for the component.


Schema = drop-down

Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, read Schemas.


Target Table = string

Provide a new table name.

Previous versions of this component prepended t_ to the target table name to help avoid clashing with existing tables; however, this is no longer the case.


Table Sort Key = drop-down

This is optional, and specifies the columns from the input that should be set as the table's sort-key.

Sort-keys are critical to good performance. See the Amazon Redshift documentation for more information.


Sort Key Options = drop-down

Choose the type of sort key to be used.

  • Compound: A compound key is made up of all of the columns listed in the sort key definition, in the order they are listed. Most useful for tables that will be queried with filters using prefixes of the sort keys.
  • Interleaved: An interleaved sort gives equal weight to each column, or subset of columns, in the sort key. Most useful for when multiple queries use different columns for filters.

Table Distribution Style = drop-down

  • Even: Distribute rows around the Redshift cluster evenly.
  • All: Copy rows to all nodes in the Redshift cluster.
  • Key: Distribute rows around the Redshift cluster according to the value of a key column.

Table distribution is critical to good performance - see the Amazon Redshift documentation for more information.


Table Distribution Key = drop-down

This is only displayed if the Table Distribution Style is set to Key. It is the column used to determine which cluster node the row is stored on.

Name = string

A human-readable name for the component.


Project = drop-down

Select the Google Cloud project. The special value, [Environment Default], will use the project defined in the environment. For more information, read Creating and managing projects.


Dataset = drop-down

Select the Google BigQuery dataset to load data into. The special value, [Environment Default], will use the dataset defined in the environment. For more information, read Introduction to datasets.


Target Table = string

Provide a new table name.

Previous versions of this component prepended t_ to the target table name to help avoid clashing with existing tables; however, this is no longer the case.

Name = string

A human-readable name for the component.


Schema = drop-down

Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on schemas, read the Azure Synapse documentation.


Table = string

Specify the name of the table to create.

This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. For more information, please refer to the Azure Synapse documentation.


Distribution Style = drop-down

Select the distribution style:

  • Hash: This setting assigns each row to one distribution by hashing the value stored in the distribution_column_name. The algorithm is deterministic, meaning it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL, because all rows that have NULL are assigned to the same distribution.
  • Replicate: This setting stores one copy of the table on each Compute node. For SQL Data Warehouse, the table is stored on a distribution database on each Compute node. For Parallel Data Warehouse, the table is stored in an SQL Server filegroup that spans the Compute node. This behavior is the default for Parallel Data Warehouse.
  • Round Robin: Distributes the rows evenly in a round-robin fashion. This is the default behaviour.

For more information, please read [this article](https://learn.microsoft.com/en-gb/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute.


Distribution Column = drop-down

Select the column to act as the distribution column. This property is only available when the Distribution Style property is set to "Hash".


Partition Key = drop-down

Select the table's partition key. Table partitions determine how rows are grouped and stored within a distribution.

For more information on table partitions, please refer to this article.


Index Type = drop-down

Select the table indexing type. Options include:

  • Clustered: A clustered index may outperform a clustered columnstore table when a single row needs to be retrieved quickly. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clustered index column. Choosing this option prompts the Index Column Grid property.
  • Clustered Column Store: This is the default setting. Clustered columnstore tables offer both the highest level of data compression and the best overall query performance, especially for large tables. Choosing this option prompts the Index Column Order property.
  • Heap: Users may find that using a heap table is faster for temporarily landing data in Synapse SQL pool. This is because loads to heaps are faster than to index tables, and in some cases, the subsequent read can be done from cache. When a user is loading data only to stage it before running additional transformations, loading the table to a heap table is much faster than loading the data to a clustered columnstore table.

For more information, please consult the Azure Synapse documentation.


Index Column Grid = column editor

  • Name: The name of each column.
  • Sort: Assign a sort orientation of either ascending (Asc) or descending (Desc).

Index Column Order = dual listbox

Select the columns in the order to be indexed.


Strategy

Drop and recreate a target table, and at runtime perform a bulk-insert from the input flow.


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics