Microsoft SQL Server streaming connector
The Microsoft SQL Server connector can monitor and capture row-level changes within Microsoft SQL Server schema. By connecting to a Microsoft SQL Server database or cluster, this connector takes a consistent snapshot of the schemas in the database. The connector produces a change event record for every insert, update, and delete event occurring in any tables being monitored by the streaming agent.
Encryption is enabled by default. To learn more, read Advanced settings.
Note
The Microsoft SQL Server streaming process will not capture computed values. These values will appear as NULL
in the change table.
Note
We are using the Debezium Microsoft SQL Server connector to connect to this source. For more information, read Debezium connector for Microsoft SQL Server.
Prerequisites
Observe the following information about supported versions and database rules.
- Microsoft SQL Server version 13 (SQL Server 2016 SP1) or higher is supported.
- Your Microsoft SQL Server agent must be running.
- Your database must be configured for streaming. Read Configuring Microsoft SQL Server database to learn more.
Understanding streaming for Microsoft SQL Server
- Data Productivity Cloud streaming pipelines do not interrogate the transaction log; instead, they use the change data capture in Microsoft SQL Server.
- A
cdc.
schema will be created in the database that is enabled for streaming, and then system tables are created for the following:cdc.captured_columns
cdc.change_tables
cdc.ddl_history
cdc.index_columns
cdc.lsn_time_mapping
- Transparent Data Encryption (TDE) is handled and supported.
- By default, three days of data is retained in the Microsoft SQL Server change tables.
- If it appears that the pipeline is not picking up data, you can validate that the internal streaming process is working by querying the change tracking tables (
cdc.captured_columns
) created in the database.
For a deeper explanation of streaming for Microsoft SQL Server, read the Microsoft documentation.
Source setup
Refer to this section when you create a streaming pipeline.
Server address
= string
The server address of your Microsoft SQL Server database.
Port
= integer
The port number used to access your Microsoft SQL Server database. Common ports are 1433
and 1434
; we use 1433
as the default but you can change this.
Database name
= string
The name of your Microsoft SQL Server container database installation.
Read Replica
= boolean
Set to true ✅ if the source database is an always-on, read-only replica database.
Username
= string
The username used to log in to the specified database.
Secrets Manager
= drop-down
Choose the service you use to manage secrets.
Secret name
= string
The name of the secret in your secret manager service that references your Microsoft SQL Server password.
JDBC parameters and specific connection settings
= column editor
This parameter is optional.
Specify any parameter:value pairs as part of an advanced connection. Click Save to finish adding parameters.
Click Connect to establish the connection to your source database.
Create a signal table
Follow this section to create a Microsoft SQL Server signal table.
Create the table
Create a Microsoft SQL Server signal table with the following SQL statement:
CREATE TABLE <schema>.<table> (
id VARCHAR(42) PRIMARY KEY,
type VARCHAR(32) NOT NULL,
data VARCHAR(2048) NULL
)
The signal table should have three columns:
id
: An identifier for each signal execution of type VARCHAR(42).type
: Indicates the type of signal being sent of type VARCHAR(32).data
: Stores additional data related to the signal of type VARCHAR(2048).
Configure streaming for Microsoft SQL Server
EXEC sys.sp_cdc_enable_table
@source_schema = N'<schema>',
@source_name = N'<table>',
@role_name = N'<streaming_user>',
@supports_net_changes = 1
Replace <schema>
and <table>
with the appropriate schema and table names for your signal table. Replace <streaming_user>
with the user configured for streaming.
Grant database user privileges
GRANT SELECT ON OBJECT::<schema>.<table> TO <streaming_user>;
GRANT INSERT ON OBJECT::<schema>.<table> TO <streaming_user>
Replace <schema>
and <table>
with the names used in the create statement, and <streaming_user>
with the user configured for CDC.
Confirm streaming setup
Use a system stored procedure to confirm that the table is correctly configured for streaming:
EXEC sys.sp_cdc_help_change_data_capture
Once you set up the signal table, on-demand snapshots can be enabled, and new tables added to the pipeline after the initial startup will be snapshotted. If on-demand snapshots are disabled, any new tables added to the pipeline will capture changes from that point onwards but will not sync their historical data with the target, and the pipeline will immediately have a STREAMING status.
Advanced settings
Advanced settings are optional. Advanced setting are entered as parameter:value pairs in the Add advanced settings dialog. Some parameters you might want to add are discussed below.
Encryption
In the Pipeline configuration section of the Create streaming pipeline wizard, set the encryption advanced parameters. This dialog lets you configure the connection to Microsoft SQL Server.
Microsoft SQL Server connections use encryptions by default. You can specify security settings with advanced settings, including:
encrypt
: Connect with encryption.true
by default. Set tofalse
to disable encryption.trustServerCertificate
: Implicitly trust a certificate provided by the server. Set totrue
to trust (skip validation of) the Microsoft SQL Server TLS certificate.
Trusting the server certificate can allow users to connect with encryption on. Due to the streaming agent being containerized, it is not currently possible to add certificates to the TrustStore.
For more information, read Microsoft's documentation about Connecting with encryption.
database.applicationIntent
The Microsoft SQL Server connector can capture changes from an always-on, read-only replica.
This setting is set to ReadOnly
. This is required to support the Microsoft SQL Server Always ON (Read Replica) structure. Read SQL Server Always On to learn more.
Configuring new table
When adding tables to a pipeline, they must be correctly configured to allow the on-demand snapshot to take place and to ensure future changes on those tables will be streamed. Read Configure your Mirosoft SQL Server database for a more comprehensive outline of these requirements.
Enable streaming on new tables
To enable on-demand snapshots and streaming on new tables, use the sys.sp_cdc_enable_table
system stored procedure. Run the following SQL command for each new table added to the pipeline:
EXEC sys.sp_cdc_enable_table
@source_schema = N'<schema>',
@source_name = N'<table>',
@role_name = N'<streaming_user>',
@supports_net_changes = 1
Replace <schema>
and <table>
with the appropriate schema and table names for each new table. Replace <streaming_user>
with the name of the user configured for streaming.
Confirm configuration with a system stored procedure
To verify that the new tables are correctly configured for streaming, use the sys.sp_cdc_help_change_data_capture
system stored procedure:
EXEC sys.sp_cdc_help_change_data_capture
This will show information about the tables and their change data capture configuration.
Grant SELECT privilege to streaming user
For on-demand snapshots to take place, the user configured for streaming must have the SELECT privilege on the new tables. Use the following SQL command for each table that needs to be snapshotted:
GRANT SELECT ON OBJECT::<schema>.<table> TO <user>
Replace <schema>
and <table>
with the names of each table to be snapshotted, and <user>
with the user configured for streaming.