DynamoDB Load
Overview
Load data into an existing table from objects stored in Amazon DynamoDB. This uses an innate ability of DynamoDB to push to Amazon Redshift, unlike the similar DynamoDB Query component, which goes via a third-party driver. It is generally advised that for pushing full data sets, this component should be used, whereas DynamoDB Query should be used for more selective data loads.
This component requires working AWS Credentials with read access to the EMR cluster containing the source data files. The is easily achieved by attaching an IAM role to the instance when launching Matillion ETL for Redshift; however, it can also be managed manually by editing an environment.
Properties
Name
= string
A human-readable name for the component.
Schema
= drop-down
Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, read Schemas.
Target Table
= string
An existing table to load data into.
Load Columns
= dual listbox
Use the arrow buttons to add columns into the load.
DynamoDB Table
= string
The name of the DynamoDB table to read from.
Read Ratio
= integer
The percentage of the DynamoDB table's provisioned throughput to use for the data load. To understand the trade-offs in changing this number, read Loading data from an Amazon DynamoDB table.
Explicit IDs
= drop-down
Whether or not to load data from the S3 objects into an IDENTITY column. Read Load VENUE with explicit values for an IDENTITY column to learn more.
Region
= drop-down
The Amazon S3 region hosting the S3 bucket. This is not normally required and can be left as None if the bucket is in the same region as your Amazon Redshift cluster.
Maximum Errors
= integer
The maximum number of individual parsing errors that cause the whole load to fail. Values up to this will be substituted as null values. This value defaults to 0, but the Amazon default is 1000.
Date Format
= string
Defaults to auto. This can be used to manually specify a date format.
Time Format
= string
Defaults to auto.
This can be used to manually specify a time format.
Accept Any Date
= boolean
If this is enabled, invalid dates such as '45-65-2018' are not considered an error, but will be loaded as the null value.
Truncate Columns
= boolean
If this is set, any instance of data in the input file that is too long to fit into the specified target column width will be truncated to fit instead of causing an error.
Trim Blanks
= boolean
Removes trailing and leading whitespace from the input data.
Empty As Null
= boolean
If this is set, empty columns in the input file will become NULL.
Blanks As Null
= boolean
If this is set, blank columns in the input file will become NULL.
Comp Update
= drop-down
Apply automatic compression to the target table. Default is On.
Stat Update
= drop-down
Governs automatic computation and refresh of optimizer statistics at the end of a successful COPY command.
Round Decimals
= boolean
If this option is set, round any decimals to fit into the column when the number of decimal places in the input data is larger than defined for the target column.
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
❌ | ❌ | ✅ | ❌ | ❌ |