Microsoft SQL Server snapshots

When setting up a pipeline using the Microsoft SQL Server connector, you can enable or disable snapshotting features. Snapshotting can be modified after a pipeline has started by stopping the pipeline and editing its configuration.

The Microsoft SQL Server connector offers two snapshot configurations:

Initial snapshot.
On demand snapshot.

Initial snapshot

This configuration allows you to take an initial snapshot of the specified tables when the pipeline is started. The initial snapshot captures the current state of the data in those tables at the beginning of the pipeline execution. By enabling this option, you ensure that the pipeline starts with the most recent data available. The initial snapshot setting will only effect a pipeline on its first startup.

Initial snapshot effect: The initial snapshot setting only impacts the pipeline during its first startup.
First startup determination: A pipeline is considered to be in its first startup under the following conditions:
- The target location is empty.
- The last known location in the database logs is unknown or invalid.
Initial snapshots enabled: If initial snapshots are enabled, the connector will fetch all historical data from the configured tables. During the initial snapshot process, the pipeline will have a SNAPSHOTTING status. If the pipeline is stopped before the snapshot concludes, the initial snapshot will start again when the pipeline is restarted.
Snapshot completion and streaming: Once the initial snapshot is finished, the pipeline transitions to a STREAMING status. After this point, any changes that occur in the database will be streamed, ensuring that the target remains in sync with the source database.
Initial snapshots disabled: If the initial snapshot option is disabled, the pipeline will immediately have a STREAMING status when it is started. Only changes made to the database from that point on will be streamed to the target.

On demand snapshot

Enabling on demand snapshot allows you to trigger snapshots for specific tables in the pipeline manually. This means that new tables can be added to the capture list after the pipeline's initial startup, and snapshots for those tables can be triggered as needed. On demand snapshots are useful when you want to capture specific points in time or when you need to dynamically expand the set of tables being captured without losing historical data.

Note

On demand snapshots require a minimum agent version of 2.76.1.

Dynamic table addition: Enabling on demand snapshots lets to trigger snapshots for newly added tables in the pipeline. This ensures that historical data from the newly added tables is captured without any loss.
On Demand snapshot process: While an on demand snapshot is in progress, the pipeline will display a SNAPSHOTTING status. If the pipeline is stopped during the snapshot process, the snapshot will resume on pipeline restart.
Snapshot completion and streaming: After the on demand snapshot is finished, the pipeline returns to a STREAMING status. At this point, all tables in the target location will be in sync with the source.
Signal table requirement: To enable on demand snapshots for Microsoft SQL Server, a signal table must be configured in the source database. This table is tracked by the connector and automatically triggers snapshots when it receives a signal.

Note

An on demand snapshot cannot be triggered on tables without a primary key.
Schema changes while a pipeline has a SNAPSHOTTING status should be avoided and will cause the pipeline to fail if they occur.

Setting up a signal table

A signal table should be pipeline-specific to avoid accidental overlap of snapshot executions.
The connection details provided for the source must have write permissions to the signal table.
The pipeline will insert a row into this table to trigger a snapshot.
The table should be created with the correct structure.
Supplemental logging must be enabled for the signal table.

Navigate to the correct database:

   USE <database>

Create the signal table with the following SQL:

   CREATE TABLE <schema>.<table> (
   id VARCHAR(42) PRIMARY KEY,
   type VARCHAR(32) NOT NULL,
   data VARCHAR(2048) NULL
   );

The signal table should have three columns:

id: An identifier for each signal execution of type VARCHAR(42).
type: Indicates the type of signal being sent of type VARCHAR(32).
data: Stores additional data related to the signal of type VARCHAR(2048).

2. Configure for SQL Server CDC

   EXEC sys.sp_cdc_enable_table  
   @source_schema = N'<schema>',  
   @source_name = N'<table>',  
   @role_name = N'<user>',
   @supports_net_changes = 1;

Replace <schema> and <table> with the appropriate schema and table names for your signal table. Replace <user> with the user configured for CDC.

3. Database user privileges

   GRANT SELECT ON OBJECT::<schema>.<table> TO <user>;
   GRANT INSERT ON OBJECT::<schema>.<table> TO <user>;

Replace <schema> and <table> with the names used in the create statement, and <cdcuser> with the user configured for CDC.

4. Confirm CDC configuration

Use a system stored procedure to confirm that the table is correctly configured for CDC:

EXEC sys.sp_cdc_help_change_data_capture;

By correctly configuring the signal table, on demand snapshots can be enabled, and new tables added to the pipeline after the initial startup will be snapshotted. If on demand snapshots are disabled, any new tables added to the pipeline will capture changes from that point onwards but will not sync their historical data with the target, and the pipeline will immediately have a STREAMING status.

Configuring a new table

1. Enable CDC on new tables

To enable on demand snapshots and CDC on new tables, use the sys.sp_cdc_enable_table system stored procedure. Run the following SQL command for each new table added to the pipeline:

EXEC sys.sp_cdc_enable_table
@source_schema = N'<schema>',
@source_name = N'<table>',
@role_name = N'<user>',
@supports_net_changes = 1;

Replace <schema> and <table> with the appropriate schema and table names for each new table. Replace <user> with the name of the user configured for CDC.

2. Confirm configuration with a SQL server system stored procedure

To verify that the new tables are correctly configured for CDC, use the sys.sp_cdc_help_change_data_capture system stored procedure:

EXEC sys.sp_cdc_help_change_data_capture;

This will show information about the tables and their change data capture configuration.

3. Grant SELECT privilege to CDC user

For on demand snapshots to take place, the user configured for CDC must have the SELECT privilege on the new tables. Use the following SQL command for each table that needs to be snapshotted:

GRANT SELECT ON OBJECT::<schema>.<table> TO <user>;

Replace <schema> and <table> with the names of each table to be snapshotted, and <user> with the user configured for CDC.