JDBC Incremental Load

This article is part of the series on incremental load tools and shared jobs.

The JDBC Incremental Load component is a tool designed to allow users to easily set up a Shared Job that will incrementally load from JDBC-compliant databases, rather than having to manually create such a job, which would require significantly more expertise.

Users should schedule their incremental load jobs to run periodically for the job to continually update the created tables. To learn more about scheduling, read Manage Schedules.

In the Components panel, type "JDBC incremental load" to locate the component, and drag it onto the canvas. The wizard will open once you drop the component onto the canvas.

JDBC incremental load setup (Snowflake)

Complete the following six pages in the wizard.

1. Database Selection

Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.

Database Type: Select the type of database to connect to. The available database types are:
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template: jdbc:postgresql://<host>/<database>.
Username: Provide the username for the database.
Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.

Click Next.

2. Connection Options

Page 2 of the wizard is for setting connection options. These are defined in parameter-value pairs. To add a new connection option, click +.

The following pages give the connection options for each supported database type:

Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.

Click Next.

3. Data Sources

Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.

Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.

Click Next.

4. Data Selection

Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.

Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.

Click Next.

5. Staging Configuration

On page 5 of the wizard you will specify data staging details, as follows:

Property	Type	Description
Staging Table Prefix	string	Specify a prefix to be added to all tables that are staged.
Staging Warehouse	drop-down	Select the staging warehouse.
Staging Database	drop-down	Select the staging database.
Staging Schema	drop-down	Select the staging schema.

Click Next.

6. Target Configuration

On page 6 of the wizard you will specify target data warehouse details, as follows:

Property	Setting	Description
Target Table Prefix	string	Specify a prefix to be added to all tables in the load.
Target Warehouse	drop-down	Select the target warehouse.
Target Database	drop-down	Select the target database.
Target Schema	drop-down	Select the target schema.
Concurrency	drop-down	Select whether to load data in a concurrent or sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.

JDBC incremental load setup (Delta Lake on Databricks)

Complete the following five pages in the wizard.

1. Database Selection

Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.

Database Type: Select the type of database to connect to. The available database types are:
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template: jdbc:postgresql://<host>/<database>.
Username: Provide the username for the database.
Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.

Click Next.

2. Connection Options

Page 2 of the wizard is for setting connection options. These are defined in parameter-value pairs. To add a new connection option, click +.

The following pages give the connection options for each supported database type:

Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.

Click Next.

3. Data Sources

Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.

Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.

Click Next.

4. Data Selection

Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.

Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.

Click Next to move to the final page of the wizard.

5. Configuration

On page 5 of the wizard you will specify data warehouse details, as follows:

Property	Type	Description
Staging Table Prefix	string	Specify a prefix to be added to all tables that are staged.
Staging Catalog	drop-down	Select a Databricks Unity Catalog.
Staging Database	drop-down	Select the staging database.
Stage Schema	drop-down	Select the Redshift schema via which tables will be staged.

Click Next.

6. Target Configuration

On page 6 of the wizard you will specify target data warehouse details, as follows:

Property	Setting	Description
Target Table Prefix	string	Specify a prefix to be added to all tables in the load.
Target Catalog	drop-down	Select a Databricks Unity Catalog.
Target Database	drop-down	Select the target database.
Concurrency	drop-down	Select whether to load data in a Concurrent or Sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.

JDBC incremental load setup (Redshift)

Complete the following five pages in the wizard.

1. Database Selection

Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.

Database Type: Select the type of database to connect to. The available database types are:
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template: jdbc:postgresql://<host>/<database>.
Username: Provide the username for the database.
Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.

Click Next.

2. Connection Options

Page 2 of the wizard is for setting connection options. These are defined in parameter-value pairs. To add a new connection option, click +.

The following pages give the connection options for each supported database type:

Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.

Click Next.

3. Data Sources

Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.

Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.

Click Next.

4. Data Selection

Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.

Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.

Click Next to move to the final page of the wizard.

5. Configuration

On page 5 of the wizard you will specify data warehouse details, as follows:

Property	Setting	Description
Staging Bucket	dropdown	Select the S3 bucket from the dropdown list for data staging. The available buckets depend on the selected Redshift cluster.
Staging Table Prefix	string	Specify a prefix to be added to all tables that are staged.
Stage Schema	dropdown	Select the Redshift schema via which tables will be staged.
Target Table Prefix	string	Specify a prefix to be added to all tables in the load.
Target Schema	dropdown	Select the Redshift schema into which tables will be loaded.
Target Distribution Style	dropdown	Select the distribution style: select All to copy rows to all nodes in the Redshift cluster, Even to distribute rows around the Redshift cluster evenly. The default setting is Even.
Concurrency	dropdown	Select whether to load data in a Concurrent or Sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.

Enable schema drift

Schema drift support accommodates changes made to the source data, such as:

Missing columns as a result of changes in the source schema. Missing columns are loaded as NULL.
Data type changes for specified columns in the shared job configuration. For further information, see below.
Tables no longer present in the source schema. Any missing tables will no longer be loaded. However, your shared job will fail. All other tables specified as part of the configuration in the shared job will be loaded. If this scenario occurs, edit your shared job table and columns grid variable to remove the missing table.
Manual addition of new tables or columns via the table and columns grid variable within the existing shared job configuration. If a new table or column is added to your source, it is not added to the shared job configuration as a default behavior. However, any new tables or columns can be added manually.

Data type changes will also be accommodated, but if these are not compatible changes for the target cloud platform, the current column will be renamed as <column_name>_datetime and the column re-purposed as the new data type. The format of the datetime extension is _yyyymmddhhmmss, for example _20210113110334, and will be the same for all columns in the same table in the same shared job configuration. The new column will be NULL up to the date of change, which needs to be considered for downstream dependencies such as views and reports.

Upon completion of the wizard, view the list of the component's properties, and click the three dots ... next to Automatically Update Target Metadata. Delete "No" and replace it by typing "Yes" into the text field provided. Then click OK to save the change and enable schema drift support.