Skip to content

Nested Data Load

This component takes an external table that contains a nested data structure and converts it to a standard table, unpacking the nested structure into a more practical structure.

Users should be aware of the Create External Table component and its Metadata property, which allows for nested external tables to be created.


Properties

Name = string

A human-readable name for the component.


Schema = drop-down

Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, read Schemas.


Input Schema = drop-down

The schema for the nested external table to be used as a source.

External tables require external schemas and regular schemas will not work. To learn more about external schemas, read Configuring The Matillion ETL Client from Getting Started With Amazon Redshift Spectrum. The special value, [Environment Default], will use the schema defined in the environment.


Input Table = string

The name of the source table to create or replace.


Target Schema = drop-down

The schema for the target table. This is a non-external (standard) schema.


New Table Name = string

The name of the target table to create or replace.


Column Names = data structure

A representation of the nested data structure that allows users to checkboxes for the structures, arrays and fields that they wish to load.


Column Aliases = column editor

Select a column name from the choices available in the Column Names property and assign an arbitrary new name for that column as it will appear in the target table.


Data Source Filter = column editor

  • Input Column: Select an input column. The available input columns vary depending upon the data source.
  • Qualifier:
  • Is: Compares the column to the value using the comparator.
  • Not: Reverses the effect of the comparison, so "Equals" becomes "Not equals", "Less than" becomes "Greater than or equal to", etc.
  • Comparator: Choose a method of comparing the column to the value. Possible comparators include: "Equal to", "Greater than", "Less than", "Greater than or equal to", "Less than or equal to", "Like", "Null". "Equal to" can match exact strings and numeric values, while other comparators, such as "Greater than" and "Less than", will work only with numerics. The "Like" operator allows the wildcard character [%] to be used at the start and end of a string value to match a column. The Null operator matches only null values, ignoring whatever the value is set to. Not all data sources support all comparators, meaning that it is likely that only a subset of the above comparators will be available to choose from.
  • Value: The value to be compared.

Limit = integer

Set a numeric value to limit the number of rows that are loaded.


Joins = column editor

  • Name: Array data is loaded as though it were a table in its own right and thus must be joined to the other incoming data. These joins are LEFT OUTER by default but may be specified here by selecting an array name.
  • Type: The join type specifies how the join is to be performed. The default is LEFT OUTER.

Create/Replace = drop-down

  • Create: Create the new table with the given name. Will fail if a table of that name already exists.
  • Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
  • Replace: Will create the new table, potentially overwriting any existing table of the same name.

Since other database objects depend upon this table, drop ... cascade is used, which may actually remove many other database objects.


Distribution Style = drop-down

  • All: Copy rows to all nodes in the Redshift cluster.
  • Auto: (Default) Allow Redshift to manage your distribution style.
  • Even: Distribute rows around the Redshift cluster evenly.
  • Key: Distribute rows around the Redshift cluster according to the value of a key column.

Note

Table distribution is critical to good performance. Read the Distribution styles documentation for more information.


Sort Key = dual listbox

This is optional, and lets users specify one or more columns from the input that should be set as the table's sort key.

Note

Sort keys are critical to good performance. Read Working with sort keys for more information.


Sort Key Options = drop-down

Decide whether the sort key is of a compound or interleaved variety.


Primary Keys = dual listbox

Select one or more columns to be designated as the table's primary key.


Backup Table = drop-down

Specify whether the created table is to be included in automated and manual cluster snapshots. The "BACKUP NO" setting has no effect on automatic replication of data to other nodes within the cluster, so tables with "BACKUP NO" specified are restored in a node failure. The default is "BACKUP YES".


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics