Sizing CDC agents

Here we review and explain the factors that can affect pipeline performance. Please review the information below before making your choices. You can resize your agent if the requirements of the pipeline change. The agent can be redeployed with a different resource configuration. If the snapshot processing is CPU limited, the agent can initially be deployed with a larger vCPU allocation and downsized once the snapshot has completed to handle the ongoing changes.

There are three key determinants of pipeline performance:

CPU: The number of CPU cores.
Memory: The available memory to process pipeline data.
Network: The quality, speed, and capacity of your network connection.

Please note that it's not until sometime after the pipeline status has changed to CDC that the pipeline will have reached a point where the pipeline can be resumed without restarting another full snapshot.

CPU

Insufficient CPU availability will limit the maximum throughput of the pipeline. While the pipeline can continue to stream changes, computational delays in writing changes in the data source to the storage platform mean the CDC agents will gradually become less accurate. In these circumstances, the true rate of changes will become unclear, with the rate appearing as a constant while the agent shows constant maximum CPU usage.

The longer this delay persists, the more inaccurate the pipeline will become. Both the pipeline and data source will experience errors and may even fail. For example, Postgres retains transaction logs back to the point at which the replication slot is positioned. As the pipeline falls further behind, the retained transaction logs will grow, compounding the issue.

Memory

Insufficient memory will typically cause the pipeline to fail. In these circumstances you would need to increase the available memory or reduce the volume of data, or number of tables, being processed by the pipeline.

The required memory directly scales with the number of tables being processed. The CDC agent has parallel tasks for each table, each of which comes with its own independent memory footprint as each table retains its own data buffer for writing the changes out to cloud storage.

Results Summary

To help illustrate the earlier points, the table below shows the results of testing a pipeline capturing evenly distributed changes across 83 tables, running across different services and configurations. The table shows the snapshot and Streaming rates for each of the service providers according to CPU and memory allocation levels.

Absolute values will vary depending on the structure and content of the data being captured, and values across different services cannot be directly compared as the resource abstractions are not necessarily comparable.

Service	CPU Allocation	Memory Allocation	Snapshot Rate	Streaming Rate
AWS Fargate	1 vCPU	2 GB	22k	11k
	2 vCPU	4 GB	27k	29k
	4 vCPU	8 GB	28k	30k
Google Compute Engine	2 vCPU	4 GB	25k	15k
	4 vCPU	8 GB	26k	17k
Azure Container Instances	1 vCPU	2 GB	19k	9k
	2 vCPU	4 GB	25k	13k
	4 vCPU	8 GB	29k	16k

Sizing recommendations

The tables below show a general recommendation for CPU and memory allocation according to your anticipated or actual change rate and the number of tables. Please be aware that each pipeline has a unique performance profile and it's recommended that you monitor the agent resource utilization and ensure that the pipeline is keeping up with the incoming changes.

Change Rate	CPU Allocation
Up to 5k/s	1 vCPU
Up to 10k/s	2 vCPU
Up to 20k/s	4 vCPU

Number of Tables	Memory Allocation
Up to 100	2 GB
Up to 200	4 GB
Up to 400	8 GB

Cloud container services don't allow arbitrary vCPU and memory allocations. When making a choice of resources, the recommendation is to take the lowest configuration that satisfies both requirements.