Skip to content

DynamoDB Load

Overview

Load data into an existing table from objects stored in Amazon DynamoDB. This uses an innate ability of DynamoDB to push to Amazon Redshift, unlike the similar DynamoDB Query component, which goes via a third-party driver. It is generally advised that for pushing full data sets, this component should be used, whereas DynamoDB Query should be used for more selective data loads.

This component requires working AWS Credentials with read access to the EMR cluster containing the source data files. The is easily achieved by attaching an IAM role to the instance when launching Matillion ETL for Redshift; however, it can also be managed manually by editing an environment.


Properties

Name = string

A human-readable name for the component.


Schema = drop-down

Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, read Schemas.


Target Table = string

An existing table to load data into.


Load Columns = dual listbox

Use the arrow buttons to add columns into the load.


DynamoDB Table = string

The name of the DynamoDB table to read from.


Read Ratio = integer

The percentage of the DynamoDB table's provisioned throughput to use for the data load. To understand the trade-offs in changing this number, read Loading data from an Amazon DynamoDB table.


Explicit IDs = drop-down

Whether or not to load data from the S3 objects into an IDENTITY column. Read Load VENUE with explicit values for an IDENTITY column to learn more.


Region = drop-down

The Amazon S3 region hosting the S3 bucket. This is not normally required and can be left as None if the bucket is in the same region as your Amazon Redshift cluster.


Maximum Errors = integer

The maximum number of individual parsing errors that cause the whole load to fail. Values up to this will be substituted as null values. This value defaults to 0, but the Amazon default is 1000.


Date Format = string

Defaults to auto. This can be used to manually specify a date format.


Time Format = string

Defaults to auto.

This can be used to manually specify a time format.


Accept Any Date = boolean

If this is enabled, invalid dates such as '45-65-2018' are not considered an error, but will be loaded as the null value.


Truncate Columns = boolean

If this is set, any instance of data in the input file that is too long to fit into the specified target column width will be truncated to fit instead of causing an error.


Trim Blanks = boolean

Removes trailing and leading whitespace from the input data.


Empty As Null = boolean

If this is set, empty columns in the input file will become NULL.


Blanks As Null = boolean

If this is set, blank columns in the input file will become NULL.


Comp Update = drop-down

Apply automatic compression to the target table. Default is On.


Stat Update = drop-down

Governs automatic computation and refresh of optimizer statistics at the end of a successful COPY command.


Round Decimals = boolean

If this option is set, round any decimals to fit into the column when the number of decimal places in the input data is larger than defined for the target column.


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics