Skip to content

Salesforce Incremental Load

This article is part of the series on Incremental Load Tools and Shared Jobs.

The Salesforce Incremental Load component lets users configure a pre-existing (read-only) shared job that will perform an incremental load from Salesforce. When the component is added to the Matillion ETL canvas, the Salesforce Incremental Load wizard is activated. Once this wizard is finalized, a unique component is added to the canvas, with a custom configuration provided by the user's setup choices.

Users should schedule their incremental load jobs to run periodically for the job to continually update the created tables. To learn more about scheduling, read Manage Schedules.


Salesforce Incremental Load Setup (Snowflake)

In the Components panel, type "salesforce incremental load" to locate the component, and add it to the canvas. This opens the Salesforce Incremental Load wizard.

1. Connection Details

Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.

  • Salesforce OAuth: Here, users can select a configured Salesforce OAuth entry from the drop-down list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.
  • Parameter | Value: Next, specify any connection options. These exist as parameter-value pairs. To add a parameter, click +. Users can consult the Salesforce Data Model for more information about connection options.
  • Sync Deleted Records: Check this box if you wish to synchronize record deletions from source data to the target table. This retrieves the IDs of records that have been soft deleted from Salesforce, and hard deletes the corresponding records from the target table in Snowflake. By default, this box is not checked.

Click Next.

2. Data Sources

Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load. By default, the right column is empty, and users must manually add the columns they wish to load.

Use the text fields above the columns to refine your searches.

Click Next.

3. Columns

Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

Click the settings icon to navigate into a similar multiple-select dialog that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

Click Next.

4. Staging Configuration

Page 4 of the wizard requires users to specify details for data staging.

Property Type Description
Staging Table Prefix string Specify a prefix to be added to all tables that are staged.
Staging Warehouse drop-down Select the staging warehouse.
Staging Database drop-down Select the staging database.
Staging Schema drop-down Select the staging schema.

Click Next.

5. Target Configuration

Page 5 of the wizard requires users to specify target data warehouse details.

Property Type Description
Target Table Prefix string Specify a prefix to be added to all tables in the load.
Target Warehouse drop-down Select the target warehouse.
Target Database drop-down Select the target database.
Target Schema drop-down Select the target schema.
Concurrency drop-down Select whether to load data in a concurrent or sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard and make any desired changes.


Salesforce Incremental Load Setup (Delta Lake on Databricks)

In the Components panel, type "salesforce incremental load" to locate the component, and add it to the canvas. This opens the Salesforce Incremental Load wizard.

1. Connection Details

Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.

  • Salesforce OAuth: Here, users can select a configured Salesforce OAuth entry from the drop-down list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.
  • Parameter | Value: Next, specify any connection options. These exist as parameter-value pairs. To add a parameter, click +. Users can consult the Salesforce Data Model for more information about connection options.
  • Sync Deleted Records: Check this box if you wish to synchronize record deletions from source data to the target table. This retrieves the IDs of records that have been soft deleted from Salesforce, and hard deletes the corresponding records from the target table in Databricks. By default, this box is not checked.

Click Next.

2. Data Sources

Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load.

By default, the right column is empty, and users must manually add the columns they wish to load.

Click Next.

3. Columns

Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

Click the settings icon to navigate into a similar multiple-select dialog that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

Click Next.

4. Staging Configuration

Page 4 of the wizard requires users to specify details for data staging.

Property Type Description
Catalog drop-down Select a Databricks Unity Catalog.
Staging Database drop-down Select the staging database.
Target Database drop-down Select the target database.
Staging Table Prefix string Specify a prefix to be added to all tables that are staged.
Target Table Prefix string Specify a prefix to be added to all tables in the load.
Concurrency drop-down Select whether to load data in a Concurrent or Sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard and make any desired changes.


Salesforce Incremental Load Setup (Redshift)

In the Components panel, type "salesforce incremental load" to locate the component, and add it to the canvas. This opens the Salesforce Incremental Load wizard.

1. Connection Details

Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.

  • Salesforce OAuth: Here, users can select a configured Salesforce OAuth entry from the drop-down list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.
  • Parameter | Value: Next, specify any connection options. These exist as parameter-value pairs. To add a parameter, click +. Users can consult the Salesforce Data Model for more information about connection options.
  • Sync Deleted Records: Check this box if you wish to synchronize record deletions from source data to the target table. This retrieves the IDs of records that have been soft deleted from Salesforce, and hard deletes the corresponding records from the target table in Redshift. By default, this box is not checked.

Click Next.

2. Data Sources

Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load. By default, the right column is empty, and users must manually add the columns they wish to load.

In this example, the data sources "Account" and "AcceptedEventRelation" have been selected for the incremental load.

Use the text fields above the columns to refine your searches.

Click Next.

3. Columns

Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

Click the settings icon to navigate into a similar multiple select dialog that, but that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

Click Next.

4. Configuration

Page 4 of the wizard requires users to specify data warehouse details.

Property Type Description
Staging Bucket drop-down Select the S3 bucket from the drop-down list for data staging. The available buckets depend on the selected Redshift cluster.
Staging Table Prefix string Specify a prefix to be added to all tables that are staged.
Stage Schema drop-down Select the Redshift schema via which tables will be staged.
Target Table Prefix string Specify a prefix to be added to all tables in the load.
Target Schema drop-down Select the Redshift schema into which tables will be loaded.
Target Distribution Style drop-down Select the distribution style. All copies rows to all nodes in the Redshift Cluster. Even distributes rows around the Redshift cluster evenly (this is the default setting).
Concurrency drop-down Select whether to load data in a Concurrent or Sequential method.

Click Create & Run to finish the setup; or else click Back to cycle back through the wizard.


Salesforce Incremental Load Setup (BigQuery)

In the Components panel, type "salesforce incremental load" to locate the component, and add it to the canvas. This opens the Salesforce Incremental Load wizard.

1. Connection Details

Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load Generator, and requires basic connection information.

  • Salesforce OAuth: Here, users can select a configured Salesforce OAuth entry from the drop-down list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.
  • Parameter | Value: Next, specify any connection options. These exist as parameter-value pairs. To add a parameter, click +. Users can consult the Salesforce Data Model for more information about connection options.
  • Sync Deleted Records: Check this box if you wish to synchronize record deletions from source data to the target table. This retrieves the IDs of records that have been soft deleted from Salesforce, and hard deletes the corresponding records from the target table in BigQuery. By default, this box is not checked.

Click Next.

2. Data Sources

Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load. By default, the right column is empty, and users must manually add the columns they wish to load.

Use the text fields above the columns to refine your searches.

Click Next.

3. Columns

Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

Click the settings icon to navigate into a similar multiple select dialog that, but that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

Click Next.

4. Staging Configuration

Page 4 of the wizard requires users to specify details for data staging.

Property Type Description
Cloud Storage Area drop-down Select the Google Cloud Storage bucket to stage the data.
Staging Table Prefix string Specify a prefix to be added to all tables that are staged.
Staging Project drop-down Select the staging project.
Staging Dataset drop-down Select the staging dataset.

Click Next.

5. Target Configuration

Page 5 of the wizard requires users to specify target data warehouse details.

Property Type Description
Target Table Prefix string Specify a prefix to be added to all tables in the load.
Target Project drop-down Select the target BigQuery Project.
Target Dataset drop-down Select the target dataset.
Concurrency drop-down Select whether to load data in a concurrent or sequential method.

Click Create & Run to finish the setup; or else click Back to cycle back through the wizard.


Completion

Upon completion of the wizard, a Salesforce Incremental Load component will be present on the canvas. This component isn't identical to the Salesforce Query component, but it does utilise the same ideas to perform an incremental load from Salesforce.

To make changes to the component, simply click it, and click into any of the component's properties—exactly as you would with any other Matillion ETL component. Users should avoid making changes to the component during a running workflow.


Enable schema drift

Schema drift support accommodates changes made to the source data, such as:

  • Missing columns as a result of changes in the Salesforce schema. Missing columns are loaded as NULL.
  • Data type changes for specified columns in the shared job configuration. For further information, see below.
  • Tables no longer present in the Salesforce schema. Any missing tables will no longer be loaded. However, your shared job will fail. All other tables specified as part of the configuration in the shared job will be loaded. If this scenario occurs, edit your shared job Table and Columns grid variable to remove the missing table.
  • Manual addition of new tables or columns via the Table and Columns grid variable within the existing shared job configuration. If a new table or column is added to your source, it is not added to the shared job configuration as a default behavior. However, any new tables or columns can be added manually.

Data Type changes will also be accommodated, but if these are not compatible changes for the target cloud platform, the current column will be renamed as <column_name>_datetime and the column re-purposed as the new data type. The format of the datetime extension is _yyyymmddhhmmss ( e.g._20210113110334) and will be the same for all columns in the same table in the same shared job configuration. The new column will be NULL up to the date of change—this should be considered for downstream dependencies such as views and reports.

Upon completion of the wizard, view the list of the component's properties, and click the configure icon next to Automatically Update Target Metadata. Delete "No" and replace it by typing "Yes" into the text field provided. Then, click OK to save the change, and enable schema drift support.


Deny listed tables

The following tables have been deny listed:

  • AccountHistory
  • Announcement
  • ContactHistory
  • ContentDocumentLink
  • ContentFolderItem
  • ContentFolderMember
  • ContentHubItem
  • DataStatistics
  • DatacloudAddress
  • EntityParticle
  • FeedAttachment
  • FeedItem
  • FeedRevision
  • FieldDefinition
  • FlexQueueItem
  • LeadHistory
  • LeadShare
  • ListViewChartInstance"
  • Note
  • OutgoingEmail
  • OutgoingEmailRelation
  • OwnerChangeOptionInfo
  • PicklistValueInfo
  • PlatformAction
  • RelationshipDomain
  • RelationshipInfo
  • SearchLayout
  • SiteDetail
  • TaskFeed
  • TaskRelation
  • UserEntityAccess
  • UserFieldAccess
  • Vote

Video