Failover recovery for streaming pipelines🔗
When a source database fails over to a standby, the Streaming agent must reconnect and resume streaming from the correct position. Whether this happens automatically depends on your database version and configuration.
This page describes the failover behavior for each supported source and explains how to configure your environment to minimize data loss and downtime.
PostgreSQL🔗
PostgreSQL uses replication slots to track how far the Streaming agent has read from the change log. During failover, the new primary must have this replication state available. If it does not, the pipeline cannot safely resume.
| PostgreSQL version | Failover behavior | Recommended action |
|---|---|---|
| 17+ | Continues streaming automatically (with configuration) | Enable slot.failover |
| ≤16 | May fail or miss data | Restart pipeline and run a snapshot |
PostgreSQL 17🔗
PostgreSQL 17 introduced native replication slot synchronization. When enabled, the standby maintains a copy of the replication slot, including the current LSN, allowing the pipeline to resume streaming after failover without data loss.
Requirements🔗
Enable replication slot synchronization on your PostgreSQL instance:
sync_replication_slots = on
Configure slot failover🔗
To instruct the Streaming agent to create a failover-enabled replication slot, add the following advanced property when creating or editing your streaming pipeline:
| Property | Value |
|---|---|
slot.failover |
true |
To add this property:
- Open your streaming pipeline for editing. Read Manage streaming pipelines for details.
- Expand Advanced properties.
- Enter
slot.failoveras the key andtrueas the value. - Click Save pipeline.
When the pipeline starts, the Streaming agent creates a replication slot with failover enabled. PostgreSQL automatically syncs the slot, including the current LSN, to the standby. You can verify this by confirming that the replication slot exists on both the primary and standby databases.
Behavior during a failover🔗
When the primary becomes unavailable:
- The Streaming agent attempts to reconnect.
- If you are using a proxy or load balancer, it reconnects automatically to the promoted standby and continues streaming from the last recorded LSN.
- If you are connecting directly to the database host, the pipeline stops. Update the connection to the new primary and restart the pipeline.
Note
Using a proxy or load balancer is recommended for automatic failover recovery. Without one, you must manually update the pipeline connection after failover.
PostgreSQL 16 and below🔗
PostgreSQL 16 does not support automatic synchronization of replication slots between the primary and standby. After a failover, the new primary may not have the replication state required for the Streaming agent to resume streaming.
Behavior during a failover🔗
When the primary goes down, the Streaming agent attempts to reconnect. Reconnection may fail because the replication slot does not exist on the new primary or is out of sync.
You may see an error similar to:
The connector is trying to read change stream starting at LSN {<lsn_value>},
but this is no longer available on the server. Reconfigure the connector to
use a snapshot mode when needed.
Manual slot pre-creation🔗
You can create replication slots on both the primary and standby before starting the pipeline. However, this approach is not reliable because the LSN is not automatically kept in sync between nodes. This can result in missed or inconsistent data after failover.
Recommended recovery approach🔗
After a failover on PostgreSQL 16, run a snapshot of the affected tables to ensure a consistent baseline:
- Wait for the promoted standby to become the new primary and confirm it's accessible.
- Start or restart the streaming pipeline, pointed at the new primary.
- Request a new snapshot of all tables in the pipeline. Read Configuring a snapshot for details.
This ensures data consistency after failover, but requires reloading the dataset.
Tip
Use PostgreSQL 17 or later with replication slot failover enabled to minimize disruption and avoid data loss during failover.