Db2 for IBM i snapshot

When setting up a pipeline, you can enable or disable the snapshot feature. A snapshot is a record of the source tables taken at the time the snapshot is completed. Changes after this point are not recorded, unless specified. The snapshot type and affected tables can be changed after a pipeline has started by stopping the pipeline and editing the configuration.

For the Db2 IBM i connector, there are two snapshot configurations available:

Initial snapshot: A snapshot is taken when the pipeline first starts.
On demand snapshot: Additional snapshots are taken whenever new tables are added to the capture list.

The Pipeline summary shows the snapshot type and a list of tables included in the snapshot.

Initial snapshot

Note

For snapshot purposes, a pipeline is considered to start for the first time when the target location is empty or the last known location in the database logs is unknown or invalid.

With initial snapshots enabled, the connector will fetch all historical data from the tables it has been configured to capture changes from. While an initial snapshot is taking place, the pipeline will have a SNAPSHOTTING status. The pipeline has not started streaming at this point.

If a pipeline is stopped before the snapshot concludes, an initial snapshot will start again when the pipeline is restarted.

After the snapshot is finished, the pipeline will have a STREAMING status. Any changes that have occurred on the database after the pipeline was started will be streamed and from this point onward, the target will be in sync with the source database.

If the initial snapshot option is disabled, a pipeline will immediately have a STREAMING status when it is started, and only changes to the database from that point onward will be streamed to the target.

On-demand snapshot

The on-demand snapshot configuration allows future snapshots to be triggered on a pipeline whenever new tables are added to the capture list. This allows pipelines to dynamically expand the tables being captured without needing to re-snapshot tables that have already been captured.

Upon enabling on-demand snapshots, any tables that are added to a pipeline by editing it after it has initially run will be automatically snapshotted. While an on-demand snapshot is taking place, the pipeline will display a SNAPSHOTTING status.

If a pipeline is stopped for any reason while an on-demand snapshot is taking place, the snapshot will resume when the pipeline next restarts.

Once the pipeline has finished snapshotting, it will return to a STREAMING status. After this point, all tables in the target location will be in sync with the source.

Note

An on-demand snapshot cannot be triggered on tables without a primary key.
Currently, on-demand snapshot only supports a single journal. You can't select on-demand snapshot if there is more than one journal configured for the selected tables.
Schema changes for new tables while a pipeline has a SNAPSHOTTING status should be avoided and will cause the pipeline to fail if they occur.
Schema changes for new tables in the time window between stopping, editing, and restarting an edited pipeline should be avoided and will cause the pipeline to fail if they occur.

To enable on-demand snapshots for Db2 IBM i, a signal table must be configured on the source database. This table is tracked by the connector and will automatically trigger snapshots when it receives a signal to do so.

Setting up a signal table

A signal table should be pipeline specific to avoid accidental overlap of snapshot executions.
The connection details provided for the source must have write permissions to the signal table. The pipeline will insert a row into this table to trigger a snapshot.
The table should be created with the correct structure.
The table must be tracked by the same journal as the rest of the tables selected—this will allow the pipeline to react to signals and trigger any necessary snapshot actions.

The following actions should be taken to ensure the signal table is configured correctly and can be used to enable on-demand snapshots:

The table should be created with the following SQL Statement:
```
CREATE TABLE <schema>.<table> (
id VARCHAR(42) NOT NULL PRIMARY KEY, 
type VARCHAR(32) NOT NULL, 
data VARCHAR(2048)
);
```
Replace <schema> and <table> in accordance with your database naming convention.

The signal table should have three columns:
- id: This column serves as an identifier for each signal execution and should be of type VARCHAR(42).
- type: This column indicates the type of signal being sent and should be of type VARCHAR(32).
- data: This column stores additional data related to the signal and should be of type VARCHAR(2048).
Start journaling for the signal table, where <schema> is the schema, <table> is the signal table name and <journal-name> is name of the journal that will record changes to the signal table:
```
    STRJRNPF FILE(<schema>/<table>) JRN(<schema>/<journal-name>) IMAGES(*BOTH)
```
The database role that is configured for running a pipeline will need to have the expected privileges on the signal table. The on-demand snapshot requires USAGE on the schema containing the signal table and INSERT on the signal table.
- SQL USAGE privileges on the schema containing the signal table:
```
    GRANT USAGE ON <schema> TO <user>;
```
Where <schema> is the schema name and <user> is the user account name.
- SQL INSERT privileges on the signal table:
```
    GRANT INSERT ON <schema>.<table> TO <user>;
```
Where <schema> is the schema name, <table> is the signal table name, and <user> is the user account name.

By correctly configuring a signal table, the on-demand snapshots feature can be enabled and new tables added to the pipeline after the initial startup is snapshotted.

Configuring new tables

When adding tables to a pipeline, they must be configured in the same way as instructed when first setting up the pipeline. Read Configuring Db2 for IBM i for a comprehensive outline of these requirements.

If on-demand snapshots are disabled, any new tables added to a pipeline will have their changes captured from that point onwards, but will not sync their historical data with the target and the pipeline will immediately have a STREAMING status.

Migrating tables to a single journal

You may already have tables configured to be tracked by different journals—even though on-demand snapshots currently only support a single journal. You can change the journal associated with a table by running the following commands:

End journaling for a table:
```
    ENDJRNPF FILE(<schema>/<table>)
```
Where <schema> is the schema name and <table> is the table name to end journal tracking for.
Restart journal tracking for a table under a predefined journal:
```
    STRJRNPF FILE(<schema>/<table>) JRN(<schema>/<journal-name>) IMAGES(*BOTH)
```
Where <schema> is the schema, <table> is the table name for the table we are moving to the journal, and <journal-name> is name of the journal that will record changes for the table going forwards.