Oracle snapshots
When setting up a pipeline using the Oracle connector, you can enable or disable snapshotting features. Snapshotting can be modified after a pipeline has started by stopping the pipeline and editing its configuration.
The Oracle connector offers two snapshot configurations:
- Initial snapshot.
- On demand snapshot.
Initial snapshot
This configuration allows you to take an initial snapshot of the specified tables when the pipeline is started. The initial snapshot captures the current state of the data in those tables at the beginning of the pipeline execution. By enabling this option, you ensure that the pipeline starts with the most recent data available. The initial snapshot setting will only effect a pipeline on its first startup.
- Initial snapshot effect: The initial snapshot setting only impacts the pipeline during its first startup.
- First startup determination: A pipeline is considered to be in its first startup under the following conditions:
- The target location is empty.
- The last known location in the database logs is unknown or invalid.
- Initial snapshots enabled: If initial snapshots are enabled, the connector will fetch all historical data from the configured tables. During the initial snapshot process, the pipeline will have a SNAPSHOTTING status. If the pipeline is stopped before the snapshot concludes, the initial snapshot will start again when the pipeline is restarted.
- Snapshot completion and streaming: Once the initial snapshot is finished, the pipeline transitions to a STREAMING status. After this point, any changes that occur in the database will be streamed, ensuring that the target remains in sync with the source database.
- Initial snapshots disabled: If the initial snapshot option is disabled, the pipeline will immediately have a STREAMING status when it is started. Only changes made to the database from that point on will be streamed to the target.
On demand snapshot
Enabling on demand snapshot allows you to trigger snapshots for specific tables in the pipeline manually. This means that new tables can be added to the capture list after the pipeline's initial startup, and snapshots for those tables can be triggered as needed. On demand snapshots are useful when you want to capture specific points in time or when you need to dynamically expand the set of tables being captured without losing historical data.
Note
On demand snapshots require a minimum agent version of 2.76.1
.
- Dynamic table addition: Enabling on demand snapshots lets to trigger snapshots for newly added tables in the pipeline. This ensures that historical data from the newly added tables is captured without any loss.
- On Demand snapshot process: While an on demand snapshot is in progress, the pipeline will display a SNAPSHOTTING status. If the pipeline is stopped during the snapshot process, the snapshot will resume on pipeline restart.
- Snapshot completion and streaming: After the on demand snapshot is finished, the pipeline returns to a STREAMING status. At this point, all tables in the target location will be in sync with the source.
- Schema structure requirement: The Oracle connector must have the schema structure of a table before it can snapshot it. The connector saves the schema structure of a table if it exists within the database on the first pipeline startup or when it receives records for that table. If a table is created after the initial pipeline start, it needs to have a CRUD operation (e.g., insert, update, delete) occur on it before the pipeline is restarted. This allows the connector to stream records for the new table, save its schema, and initialize a snapshot.
- Skipping tables with unknown schema: If a table's schema is unknown to the pipeline, it will skip snapshotting that particular table. However, future operations on that table will still be streamed as expected.
- Signal table requirement: To enable on demand snapshots for Oracle, a signal table must be configured in the source database. This table is tracked by the connector and triggers snapshots automatically when it receives a signal.
Note
- An on demand snapshot cannot be triggered on tables without a primary key.
- Schema changes while a pipeline has a SNAPSHOTTING status should be avoided and will cause the pipeline to fail if they occur.
Setting up a signal table
- A signal table should be pipeline-specific to avoid accidental overlap of snapshot executions.
- The connection details provided for the source must have write permissions to the signal table.
- The pipeline will insert a row into this table to trigger a snapshot.
- The table should be created with the correct structure.
- Supplemental logging must be enabled for the signal table.
If required, navigate into the pluggable database. Replace <pdb>
with your pluggable database name:
ALTER SESSION SET CONTAINER=<pdb>
1. Signal table creation
Create the signal table with the following SQL statement:
CREATE TABLE <schema>.<table> (
id VARCHAR(42) PRIMARY KEY,
type VARCHAR(32) NOT NULL,
data VARCHAR(2048) NULL
);
Replace <schema>
and <table>
with the appropriate names according to your database naming convention.
The signal table should have three columns:
id
: An identifier for each signal execution of type VARCHAR(42).type
: Indicates the type of signal being sent of type VARCHAR(32).data
: Stores additional data related to the signal of type VARCHAR(2048).
2. Supplemental logging
Enable supplemental logging for the signal table with the ALTER TABLE
statement:
ALTER TABLE <schema>.<table> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
Replace Grant the Replace By correctly configuring the signal table with the appropriate structure, supplemental logging, and user privileges, you can enable on demand snapshots in the Oracle connector. This allows new tables added to the pipeline after the initial startup to be automatically snapshotted, capturing both historical and future changes. Note If on demand snapshots are disabled, new tables added to the pipeline will only capture changes from the point of their addition, without syncing their historical data. In this case, the pipeline will immediately enter a STREAMING status. When adding tables to a pipeline, they must be correctly configured to allow the on demand snapshot to take place and to ensure change data capture will stream future changes on those tables. If required, navigate to the specific pluggable database (PDB) where the new tables belong. Use the following SQL command, replacing For each new table added to the pipeline, enable supplemental logging to capture the required information by Oracle. Use the following SQL command: Replace To verify the status of logging for all tables owned by a given user, you can use the following SQL query: Replace If you have configured the CDC user, you should not need to grant any new permissions. Any permissions previously granted to the CDC user on individual tables should also apply to the new tables. with the names used in the create statement.
3. User privileges
INSERT
privilege on the signal table to the database user configured for running the pipeline:GRANT INSERT ON <schema>.<table> TO <cdcuser>;
<schema>
and <table>
with the names used in the create statement and <cdcuser>
with the user configured for CDC.
Configuring new tables
1. Navigate to the pluggable database (PDB)
<pdb>
with your PDB name:ALTER SESSION SET CONTAINER=<pdb>;
2. Enable supplemental logging
-- Run for each table added to a pipeline
ALTER TABLE <schema>.<table> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
<schema>
and <table>
with the appropriate schema and table names for each new table.3. Verify supplemental logging status
SELECT OWNER, LOG_GROUP_NAME, TABLE_NAME, LOG_GROUP_TYPE, "ALWAYS", "GENERATED"
FROM DBA_LOG_GROUPS
WHERE OWNER = '<username>';
<username>
with the name of the user that the new tables belong to.4. Check CDC user permissions