Skip to content

Sync Single Table shared job

Process CDC data for a single source table, and then keep a target table in sync with the source table.


Properties

Data Source = drop-down

The CDC pipeline data source. Must be one of the supported CDC pipeline sources in Data Loader:

  • Oracle
  • PostgreSQL
  • SQL Server

Source Database = string

The name of the database containing the source table.


Source Schema = string

The name of the schema containing the source table.


Source Table = string

The name of the source table.


Cloud Storage URL = string

The URL of the location where the CDC change event files are stored. This should have the format: s3://<bucket>/<prefix>/ or azure://<account>.blob.core.windows.net/<container>/<prefix>/


Warehouse = string

The Snowflake warehouse used to process the data.


Target Database = string

The Snowflake database containing the external table with change events and the target table.


Target Schema = string

The Snowflake schema containing the external table with change events and the target table.


External Table = string

The external table in Snowflake containing the CDC change events.


Target Table Prefix = string

A prefix to prepend to the source table name to generate the target table name. If no target table prefix is specified, the target table will have the same name as the source table.


Fully Qualify Target Table (Recommended) = boolean

Optionally includes the source database and schema in the target table name.

  • Y: The target table name will have the format <source_database>_<source_schema>_<source_table> This is the default setting.
  • N: The target table name will have the format <source_table>.

It's strongly recommended to leave this set to Y, unless you are confident that your source table names will always be unique.

If the target table name isn't fully qualified, and there are source tables with the same name in different databases or schemas, this job will see them as the same table and attempt to merge their metadata and contents.


Transformation Type = drop-down

The type of transformation used when applying the change events to the target table.

  • Copy Table: The target table will be maintained as a copy of the source table.
  • Copy Table with Soft Deletes: Same as Copy Table, but records deleted in the source table will be retained in the target table.
  • Change Log: All change events will be extracted and appended to the target table.

Note

A primary key is required on the source table for Copy Table and Copy Table with Soft Deletes transformations. The primary key is used by the shared job to merge updates into the target table. If the source table doesn't have a primary key, the transformation type will be updated to Change Log.

Note

Change Log is disabled in this version of the shared job and will be enabled in a future release.

Warning

If you intend to change the type of transformation for an already existing/running workflow, you must also drop the target tables before changing the transformation type. The transformation types are mutually exclusive and can't be applied to the same target tables.


Append All Metadata = boolean

Append all metadata columns to the target table. Options are:

  • Y: Append all metadata columns.
  • N: Only append metadata columns required by this job.

For a list of metadata columns, and how the field values are derived, read Append Metadata.


Azure Key Vault URL = string

Required for Azure Blob Storage

The URL of the Azure Key Vault used to store the Azure Blob Storage connection string. This is also called the vault's "DNS Name".


Azure Blob Storage Account Key Secret = string

Required for Azure Blob Storage

Name of the Azure Key Vault secret that stores the Azure Blob Storage account key.


Schema Drift Action = drop-down

Select which action should be taken when schema changes are detected in the source table and require an update to the target table.

  • Update Target: The target table is updated according to the new schema. This is the default setting.
  • Fail Job: This job will fail, causing no changes to the target table.

Data Source = drop-down

The CDC pipeline data source. Must be one of the supported CDC pipeline sources in Data Loader:

  • Oracle
  • PostgreSQL
  • SQL Server

Source Database = string

The name of the database containing the source table.


Source Schema = string

The name of the schema containing the source table.


Source Table = string

The name of the source table.


Cloud Storage URL = string

The URL of the location where the CDC change event files are stored. This should have the format: s3://<bucket>/<prefix>/ or azure://<account>.blob.core.windows.net/<container>/<prefix>/


Target Database = string

The Databricks database containing the external table with change events and the target table.


Stage Table Prefix = string

A prefix to prepend to the source table name to generate the stage table names.


Target Table Prefix = string

A prefix to prepend to the source table name to generate the target table name. If no Target Table Prefix is specified, the target table will have the same name as the source table.


Fully Qualify Target Table (Recommended) = boolean

Optionally includes the source database and schema in the target table name.

  • Y: The target table name will have the format <source_database>_<source_schema>_<source_table> This is the default setting.
  • N: The target table name will have the format <source_table>.

It's strongly recommended to leave this set to Y, unless you are confident that your source table names will always be unique.

If the target table name isn't fully qualified, and there are source tables with the same name in different databases or schemas, this job will see them as the same table and attempt to merge their metadata and contents.


Transformation Type = drop-down

The type of transformation used when applying the change events to the target table.

  • Copy Table: The target table will be maintained as a copy of the source table.
  • Copy Table with Soft Deletes: Same as Copy Table, but records deleted in the source table will be retained in the target table.
  • Change Log: All change events will be extracted and appended to the target table.

Note

A primary key is required on the source table for Copy Table and Copy Table with Soft Deletes transformations. The primary key is used by the shared job to merge updates into the target table. If the source table doesn't have a primary key, the transformation type will be updated to Change Log.

Note

Change Log is disabled in this version of the shared job and will be enabled in a future release.

Warning

If you intend to change the type of transformation for an already existing/running workflow, you must also drop the target tables before changing the transformation type. The transformation types are mutually exclusive and can't be applied to the same target tables.


Append All Metadata = boolean

Append all metadata columns to the target table. Options are:

  • Y: Append all metadata columns.
  • N: Only append metadata columns required by this job.

For a list of metadata columns, and how the field values are derived, read Append Metadata.


Azure Key Vault URL = string

Required for Azure Blob Storage

The URL of the Azure Key Vault used to store the Azure Blob Storage connection string. This is also called the vault's "DNS Name".


Azure Blob Storage Account Key Secret = string

Required for Azure Blob Storage

Name of the Azure Key Vault secret that stores the Azure Blob Storage account key.


Schema Drift Action = drop-down

Select which action should be taken when schema changes are detected in the source table and require an update to the target table.

  • Update Target: The target table is updated according to the new schema. This is the default setting.
  • Fail Job: This job will fail, causing no changes to the target table.

Note

Schema drift is disabled but will be enabled in a future release.

Data Source = drop-down

The CDC pipeline data source. Must be one of the supported CDC pipeline sources in Data Loader:

  • Oracle
  • PostgreSQL
  • SQL Server

Source Schema = string

The name of the schema containing the source table.


Source Table = string

The name of the source table.


Cloud Storage URL = string

The URL of the location where the CDC change event files are stored. This should have the format: s3://<bucket>/<prefix>/ or azure://<account>.blob.core.windows.net/<container>/<prefix>/


IAM Role = string

The Amazon Resource Name (ARN) for an IAM role with the necessary permissions to access the S3 bucket containing the CDC event data.


Region = drop-down

(Optional) AWS Region of the Amazon S3 bucket containing the CDC event data if it resides in a different region to the Redshift cluster.


Target Schema

The Redshift schema containing the external table with change events and the target table.


Target Table Prefix = string

A prefix to prepend to the source table name to generate the target table name. If no target table prefix is specified, the target table will have the same name as the source table.


Fully Qualify Target Table (Recommended) = boolean

Optionally includes the source database and schema in the target table name.

  • Y: The target table name will have the format <source_database>_<source_schema>_<source_table> This is the default setting.
  • N: The target table name will have the format <source_table>.

It's strongly recommended to leave this set to Y, unless you are confident that your source table names will always be unique.

If the target table name isn't fully qualified, and there are source tables with the same name in different databases or schemas, this job will see them as the same table and attempt to merge their metadata and contents.


Transformation Type = drop-down

The type of transformation used when applying the change events to the target table.

  • Copy Table: The target table will be maintained as a copy of the source table.
  • Copy Table with Soft Deletes: Same as Copy Table, but records deleted in the source table will be retained in the target table.
  • Change Log: All change events will be extracted and appended to the target table.

Note

A primary key is required on the source table for Copy Table and Copy Table with Soft Deletes transformations. The primary key is used by the shared job to merge updates into the target table. If the source table doesn't have a primary key, the transformation type will be updated to Change Log.

Note

Change Log is disabled in this version of the shared job and will be enabled in a future release.

Warning

If you intend to change the type of transformation for an already existing/running workflow, you must also drop the target tables before changing the transformation type. The transformation types are mutually exclusive and can't be applied to the same target tables.


Append All Metadata = boolean

Append all metadata columns to the target table. Options are:

  • Y: Append all metadata columns.
  • N: Only append metadata columns required by this job.

For a list of metadata columns, and how the field values are derived, read Append Metadata.


Schema Drift Action = drop-down

Select which action should be taken when schema changes are detected in the source table and require an update to the target table.

  • Update Target: The target table is updated according to the new schema. This is the default setting.
  • Fail Job: This job will fail, causing no changes to the target table.

Data Source = drop-down

The CDC pipeline data source. Must be one of the supported CDC pipeline sources in Data Loader:

  • Oracle
  • PostgreSQL
  • SQL Server

Source Database = string

The name of the database containing the source table.


Source Schema = string

The name of the schema containing the source table.


Source Table = string

The name of the source table.


Cloud Storage URL = string

The URL of the location where the CDC change event files are stored. This should have the format: gs://<bucket>/<prefix>/


Target Project = drop-down

The GCP project containing the target table.


Target Dataset = drop-down

The BigQuery dataset containing the target table.


Stage Table Prefix = string

A prefix to prepend to the source table name to generate the stage table names.


Target Table Prefix = string

A prefix to prepend to the source table name to generate the target table name. If no Target Table Prefix is specified, the target table will have the same name as the source table.


Fully Qualify Target Table (Recommended) = boolean

Optionally includes the source database and schema in the target table name.

  • Y: The target table name will have the format <source_database>_<source_schema>_<source_table> This is the default setting.
  • N: The target table name will have the format <source_table>.

It's strongly recommended to leave this set to Y, unless you are confident that your source table names will always be unique.

If the target table name isn't fully qualified, and there are source tables with the same name in different databases or schemas, this job will see them as the same table and attempt to merge their metadata and contents.


Transformation Type = drop-down

The type of transformation used when applying the change events to the target table.

  • Copy Table: The target table will be maintained as a copy of the source table.
  • Copy Table with Soft Deletes: Same as Copy Table, but records deleted in the source table will be retained in the target table.
  • Change Log: All change events will be extracted and appended to the target table.

Note

A primary key is required on the source table for Copy Table and Copy Table with Soft Deletes transformations. The primary key is used by the shared job to merge updates into the target table. If the source table doesn't have a primary key, the transformation type will be updated to Change Log.

Note

Change Log is disabled in this version of the shared job and will be enabled in a future release.

Warning

If you intend to change the type of transformation for an already existing/running workflow, you must also drop the target tables before changing the transformation type. The transformation types are mutually exclusive and can't be applied to the same target tables.


Append All Metadata = boolean

Append all metadata columns to the target table. Options are:

  • Y: Append all metadata columns.
  • N: Only append metadata columns required by this job.

For a list of metadata columns, and how the field values are derived, read Append Metadata.


Schema Drift Action = drop-down

Select which action should be taken when schema changes are detected in the source table and require an update to the target table.

  • Update Target: The target table is updated according to the new schema. This is the default setting.
  • Fail Job: This job will fail, causing no changes to the target table.

Tables created

The ETL process creates some transient staging tables during a pipeline run that are deleted after a run, regardless of success.

Tables are also created permanently to support the pipeline process, including the target table that contains your source data.

Read Tables created to learn more.


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics