When setting up a pipeline using the MySQL connector, you can enable or disable snapshotting features. Snapshotting can be modified after a pipeline has started by stopping the pipeline and editing its configuration.
The MySQL connector offers two snapshot configurations:
- Initial snapshot.
- On demand snapshot.
This configuration allows you to take an initial snapshot of the specified tables when the pipeline is started. The initial snapshot captures the current state of the data in those tables at the beginning of the pipeline execution. By enabling this option, you ensure that the pipeline starts with the most recent data available. The initial snapshot setting will only effect a pipeline on its first startup.
- Initial snapshot effect: The initial snapshot setting only impacts the pipeline during its first startup.
- First startup determination: A pipeline is considered to be in its first startup under the following conditions:
- The target location is empty.
- The last known location in the database logs is unknown or invalid.
- Initial snapshots enabled: If initial snapshots are enabled, the connector will fetch all historical data from the configured tables. During the initial snapshot process, the pipeline will have a SNAPSHOTTING status. If the pipeline is stopped before the snapshot concludes, the initial snapshot will start again when the pipeline is restarted.
- Snapshot completion and streaming: Once the initial snapshot is finished, the pipeline transitions to a STREAMING status. After this point, any changes that occur in the database will be streamed, ensuring that the target remains in sync with the source database.
- Initial snapshots disabled: If the initial snapshot option is disabled, the pipeline will immediately have a STREAMING status when it is started. Only changes made to the database from that point on will be streamed to the target.
On demand snapshot
Enabling on demand snapshot allows you to trigger snapshots for specific tables in the pipeline manually. This means that new tables can be added to the capture list after the pipeline's initial startup, and snapshots for those tables can be triggered as needed. On demand snapshots are useful when you want to capture specific points in time or when you need to dynamically expand the set of tables being captured without losing historical data.
On demand snapshots require a minimum agent version of
- Dynamic table addition: Enabling on demand snapshots lets to trigger snapshots for newly added tables in the pipeline. This ensures that historical data from the newly added tables is captured without any loss.
- Table privileges for on demand snapshots: If you wish to add a table that was created before the initial startup of the pipeline, the cdc user must have had the correct privileges on that table beforehand. If the correct privileges were not present at the start of the initial pipeline execution, a fresh initial snapshot must be completed to add the new table. Otherwise, the pipeline will fail when receiving a change event from the newly added table due to the MySQL connector not having the correct knowledge of the table's structure.
- On Demand snapshot process: While an on demand snapshot is in progress, the pipeline will display a SNAPSHOTTING status. If the pipeline is stopped during the snapshot process, the snapshot will resume on pipeline restart.
- Snapshot completion and streaming: After the on demand snapshot is finished, the pipeline returns to a STREAMING status. At this point, all tables in the target location will be in sync with the source.
- Signal table requirement: To enable on demand snapshots for MySQL, a signal table must be configured in the source database. This table is tracked by the connector and triggers snapshots automatically when it receives a signal.
- An on demand snapshot cannot be triggered on tables without a primary key.
- Schema changes while a pipeline has a SNAPSHOTTING status should be avoided and will cause the pipeline to fail if they occur.
Setting up a signal table
- A signal table should be pipeline-specific to avoid accidental overlap of snapshot executions.
- The connection details provided for the source must have write permissions to the signal table.
- The pipeline will insert a row into this table to trigger a snapshot.
- The table should be created with the correct structure.
- Supplemental logging must be enabled for the signal table.
1. Signal table creation
CREATE TABLE IF NOT EXISTS <schema>.<table> ( id VARCHAR(42) PRIMARY KEY, type VARCHAR(32) NOT NULL, data VARCHAR(2048) NULL );
The signal table should have three columns:
id: An identifier for each signal execution of type VARCHAR(42).
type: Indicates the type of signal being sent of type VARCHAR(32).
data: Stores additional data related to the signal of type VARCHAR(2048).
2. Privileges for database user
GRANT INSERT ON <schema>.<table> TO '<cdcuser>'@'%';
Ensure that the database user, specifically
<cdcuser> (user configured for CDC), has the required INSERT privileges on the signal table.
By correctly configuring the signal table, on demand snapshots can be enabled, and new tables added to the pipeline after the initial startup will be snapshotted. If on demand snapshots are disabled, any new tables added to the pipeline will capture changes from that point onwards but will not sync their historical data with the target, and the pipeline will immediately have a STREAMING status.
Configuring a new table
When adding tables to a pipeline, they must be correctly configured to allow the on demand snapshot to take place and to ensure change data capture will stream future changes on those tables. Read Configuring MySQL database for a more comprehensive outline of these requirements.
Grant permissions to CDC user
To enable on demand snapshots and CDC on new tables, the user configured for CDC must have the expected permissions. These permissions should be applied by the root user. Use the following SQL command:
-- Grant required privileges on all tables. Check with your DBA for an appropriate mask. GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO '<cdcuser>'@'%';
<cdcuser> with the name of the user configured for CDC.
By granting these privileges, the CDC user will have the necessary permissions to access the new tables and capture changes for on demand snapshots and CDC.
If on demand snapshots are disabled, any new tables added to a pipeline and configured for CDC will capture changes from the point of addition. However, historical data will not be synced with the target, and the pipeline will immediately have a STREAMING status.