JDBC Incremental Load
This article is part of the series on incremental load tools and shared jobs.
The JDBC Incremental Load component is a tool designed to allow users to easily set up a Shared Job that will incrementally load from JDBC-compliant databases, rather than having to manually create such a job, which would require significantly more expertise.
Users should schedule their incremental load jobs to run periodically for the job to continually update the created tables. To learn more about scheduling, read Manage Schedules.
In the Components panel, type "JDBC incremental load" to locate the component, and drag it onto the canvas. The wizard will open once you drop the component onto the canvas.
JDBC incremental load setup (Snowflake)
Complete the following six pages in the wizard.
1. Database Selection
Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.
-
Database Type: Select the type of database to connect to. The available database types are:
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
-
Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template:
jdbc:postgresql://<host>/<database>
. - Username: Provide the username for the database.
- Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.
Click Next.
2. Connection Options
Page 2 of the wizard is for setting connection options. These are defined in parameter-value pairs. To add a new connection option, click +.
The following pages give the connection options for each supported database type:
Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.
Click Next.
3. Data Sources
Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.
Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.
Click Next.
4. Data Selection
Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.
Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.
Click Next.
5. Staging Configuration
On page 5 of the wizard you will specify data staging details, as follows:
Property | Type | Description |
---|---|---|
Staging Table Prefix | string | Specify a prefix to be added to all tables that are staged. |
Staging Warehouse | drop-down | Select the staging warehouse. |
Staging Database | drop-down | Select the staging database. |
Staging Schema | drop-down | Select the staging schema. |
Click Next.
6. Target Configuration
On page 6 of the wizard you will specify target data warehouse details, as follows:
Property | Setting | Description |
---|---|---|
Target Table Prefix | string | Specify a prefix to be added to all tables in the load. |
Target Warehouse | drop-down | Select the target warehouse. |
Target Database | drop-down | Select the target database. |
Target Schema | drop-down | Select the target schema. |
Concurrency | drop-down | Select whether to load data in a concurrent or sequential method. |
Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.
JDBC incremental load setup (Delta Lake on Databricks)
Complete the following five pages in the wizard.
1. Database Selection
Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.
-
Database Type: Select the type of database to connect to. The available database types are:
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
-
Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template:
jdbc:postgresql://<host>/<database>
. - Username: Provide the username for the database.
- Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.
Click Next.
2. Connection Options
Page 2 of the wizard is for setting connection options. These are defined in parameter-value pairs. To add a new connection option, click +.
The following pages give the connection options for each supported database type:
Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.
Click Next.
3. Data Sources
Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.
Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.
Click Next.
4. Data Selection
Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.
Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.
Click Next to move to the final page of the wizard.
5. Configuration
On page 5 of the wizard you will specify data warehouse details, as follows:
Property | Type | Description |
---|---|---|
Staging Table Prefix | string | Specify a prefix to be added to all tables that are staged. |
Staging Catalog | drop-down | Select a Databricks Unity Catalog. |
Staging Database | drop-down | Select the staging database. |
Stage Schema | drop-down | Select the Redshift schema via which tables will be staged. |
Click Next.
6. Target Configuration
On page 6 of the wizard you will specify target data warehouse details, as follows:
Property | Setting | Description |
---|---|---|
Target Table Prefix | string | Specify a prefix to be added to all tables in the load. |
Target Catalog | drop-down | Select a Databricks Unity Catalog. |
Target Database | drop-down | Select the target database. |
Concurrency | drop-down | Select whether to load data in a Concurrent or Sequential method. |
Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.
JDBC incremental load setup (Redshift)
Complete the following five pages in the wizard.
1. Database Selection
Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.
-
Database Type: Select the type of database to connect to. The available database types are:
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
-
Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template:
jdbc:postgresql://<host>/<database>
. - Username: Provide the username for the database.
- Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.
Click Next.
2. Connection Options
Page 2 of the wizard is for setting connection options. These are defined in parameter-value pairs. To add a new connection option, click +.
The following pages give the connection options for each supported database type:
Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.
Click Next.
3. Data Sources
Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.
Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.
Click Next.
4. Data Selection
Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.
Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.
Click Next to move to the final page of the wizard.
5. Configuration
On page 5 of the wizard you will specify data warehouse details, as follows:
Property | Setting | Description |
---|---|---|
Staging Bucket | dropdown | Select the S3 bucket from the dropdown list for data staging. The available buckets depend on the selected Redshift cluster. |
Staging Table Prefix | string | Specify a prefix to be added to all tables that are staged. |
Stage Schema | dropdown | Select the Redshift schema via which tables will be staged. |
Target Table Prefix | string | Specify a prefix to be added to all tables in the load. |
Target Schema | dropdown | Select the Redshift schema into which tables will be loaded. |
Target Distribution Style | dropdown | Select the distribution style: select All to copy rows to all nodes in the Redshift cluster, Even to distribute rows around the Redshift cluster evenly. The default setting is Even. |
Concurrency | dropdown | Select whether to load data in a Concurrent or Sequential method. |
Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.
Enable schema drift
Schema drift support accommodates changes made to the source data, such as:
- Missing columns as a result of changes in the source schema. Missing columns are loaded as
NULL
. - Data type changes for specified columns in the shared job configuration. For further information, see below.
- Tables no longer present in the source schema. Any missing tables will no longer be loaded. However, your shared job will fail. All other tables specified as part of the configuration in the shared job will be loaded. If this scenario occurs, edit your shared job table and columns grid variable to remove the missing table.
- Manual addition of new tables or columns via the table and columns grid variable within the existing shared job configuration. If a new table or column is added to your source, it is not added to the shared job configuration as a default behavior. However, any new tables or columns can be added manually.
Data type changes will also be accommodated, but if these are not compatible changes for the target cloud platform, the current column will be renamed as <column_name>_datetime
and the column re-purposed as the new data type. The format of the datetime extension is _yyyymmddhhmmss
, for example _20210113110334
, and will be the same for all columns in the same table in the same shared job configuration. The new column will be NULL
up to the date of change, which needs to be considered for downstream dependencies such as views and reports.
Upon completion of the wizard, view the list of the component's properties, and click ... next to Automatically Update Target Metadata. Delete "No" and replace it by typing "Yes" into the text field provided. Then click OK to save the change and enable schema drift support.