Salesforce Incremental Load
This article is part of the series on Incremental Load Tools and Shared Jobs.
The Salesforce Incremental Load component lets users configure a pre-existing (read-only) shared job that will perform an incremental load from Salesforce. When the component is added to the Matillion ETL canvas, the Salesforce Incremental Load wizard is activated. Once this wizard is finalized, a unique component is added to the canvas, with a custom configuration provided by the user's setup choices.
Users should schedule their incremental load jobs to run periodically for the job to continually update the created tables. To learn more about scheduling, read Manage Schedules.
Salesforce Incremental Load Setup (Snowflake)
In the Components panel, type "salesforce incremental load" to locate the component, and add it to the canvas. This opens the Salesforce Incremental Load wizard.
1. Connection Details
Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.
- Salesforce OAuth: Here, users can select a configured Salesforce OAuth entry from the drop-down list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.
- Parameter | Value: Next, specify any connection options. These exist as parameter-value pairs. To add a parameter, click +. Users can consult the Salesforce Data Model for more information about connection options.
- Sync Deleted Records: Check this box if you wish to synchronize record deletions from source data to the target table. This retrieves the IDs of records that have been soft deleted from Salesforce, and hard deletes the corresponding records from the target table in Snowflake. By default, this box is not checked.
Click Next.
2. Data Sources
Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.
Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load. By default, the right column is empty, and users must manually add the columns they wish to load.
Use the text fields above the columns to refine your searches.
Click Next.
3. Columns
Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.
Click the settings icon to navigate into a similar multiple-select dialog that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.
Click Next.
4. Staging Configuration
Page 4 of the wizard requires users to specify details for data staging.
Property | Type | Description |
---|---|---|
Staging Table Prefix | string | Specify a prefix to be added to all tables that are staged. |
Staging Warehouse | drop-down | Select the staging warehouse. |
Staging Database | drop-down | Select the staging database. |
Staging Schema | drop-down | Select the staging schema. |
Click Next.
5. Target Configuration
Page 5 of the wizard requires users to specify target data warehouse details.
Property | Type | Description |
---|---|---|
Target Table Prefix | string | Specify a prefix to be added to all tables in the load. |
Target Warehouse | drop-down | Select the target warehouse. |
Target Database | drop-down | Select the target database. |
Target Schema | drop-down | Select the target schema. |
Concurrency | drop-down | Select whether to load data in a concurrent or sequential method. |
Click Create & Run to finish the setup, or else click Back to cycle back through the wizard and make any desired changes.
Salesforce Incremental Load Setup (Delta Lake on Databricks)
In the Components panel, type "salesforce incremental load" to locate the component, and add it to the canvas. This opens the Salesforce Incremental Load wizard.
1. Connection Details
Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.
- Salesforce OAuth: Here, users can select a configured Salesforce OAuth entry from the drop-down list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.
- Parameter | Value: Next, specify any connection options. These exist as parameter-value pairs. To add a parameter, click +. Users can consult the Salesforce Data Model for more information about connection options.
- Sync Deleted Records: Check this box if you wish to synchronize record deletions from source data to the target table. This retrieves the IDs of records that have been soft deleted from Salesforce, and hard deletes the corresponding records from the target table in Databricks. By default, this box is not checked.
Click Next.
2. Data Sources
Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.
Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load.
By default, the right column is empty, and users must manually add the columns they wish to load.
Click Next.
3. Columns
Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.
Click the settings icon to navigate into a similar multiple-select dialog that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.
Click Next.
4. Staging Configuration
Page 4 of the wizard requires users to specify details for data staging.
Property | Type | Description |
---|---|---|
Catalog | drop-down | Select a Databricks Unity Catalog. |
Staging Database | drop-down | Select the staging database. |
Target Database | drop-down | Select the target database. |
Staging Table Prefix | string | Specify a prefix to be added to all tables that are staged. |
Target Table Prefix | string | Specify a prefix to be added to all tables in the load. |
Concurrency | drop-down | Select whether to load data in a Concurrent or Sequential method. |
Click Create & Run to finish the setup, or else click Back to cycle back through the wizard and make any desired changes.
Salesforce Incremental Load Setup (Redshift)
In the Components panel, type "salesforce incremental load" to locate the component, and add it to the canvas. This opens the Salesforce Incremental Load wizard.
1. Connection Details
Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.
- Salesforce OAuth: Here, users can select a configured Salesforce OAuth entry from the drop-down list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.
- Parameter | Value: Next, specify any connection options. These exist as parameter-value pairs. To add a parameter, click +. Users can consult the Salesforce Data Model for more information about connection options.
- Sync Deleted Records: Check this box if you wish to synchronize record deletions from source data to the target table. This retrieves the IDs of records that have been soft deleted from Salesforce, and hard deletes the corresponding records from the target table in Redshift. By default, this box is not checked.
Click Next.
2. Data Sources
Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.
Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load. By default, the right column is empty, and users must manually add the columns they wish to load.
In this example, the data sources "Account" and "AcceptedEventRelation" have been selected for the incremental load.
Use the text fields above the columns to refine your searches.
Click Next.
3. Columns
Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.
Click the settings icon to navigate into a similar multiple select dialog that, but that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.
Click Next.
4. Configuration
Page 4 of the wizard requires users to specify data warehouse details.
Property | Type | Description |
---|---|---|
Staging Bucket | drop-down | Select the S3 bucket from the drop-down list for data staging. The available buckets depend on the selected Redshift cluster. |
Staging Table Prefix | string | Specify a prefix to be added to all tables that are staged. |
Stage Schema | drop-down | Select the Redshift schema via which tables will be staged. |
Target Table Prefix | string | Specify a prefix to be added to all tables in the load. |
Target Schema | drop-down | Select the Redshift schema into which tables will be loaded. |
Target Distribution Style | drop-down | Select the distribution style. All copies rows to all nodes in the Redshift Cluster. Even distributes rows around the Redshift cluster evenly (this is the default setting). |
Concurrency | drop-down | Select whether to load data in a Concurrent or Sequential method. |
Click Create & Run to finish the setup; or else click Back to cycle back through the wizard.
Salesforce Incremental Load Setup (BigQuery)
In the Components panel, type "salesforce incremental load" to locate the component, and add it to the canvas. This opens the Salesforce Incremental Load wizard.
1. Connection Details
Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load Generator, and requires basic connection information.
- Salesforce OAuth: Here, users can select a configured Salesforce OAuth entry from the drop-down list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.
- Parameter | Value: Next, specify any connection options. These exist as parameter-value pairs. To add a parameter, click +. Users can consult the Salesforce Data Model for more information about connection options.
- Sync Deleted Records: Check this box if you wish to synchronize record deletions from source data to the target table. This retrieves the IDs of records that have been soft deleted from Salesforce, and hard deletes the corresponding records from the target table in BigQuery. By default, this box is not checked.
Click Next.
2. Data Sources
Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.
Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load. By default, the right column is empty, and users must manually add the columns they wish to load.
Use the text fields above the columns to refine your searches.
Click Next.
3. Columns
Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.
Click the settings icon to navigate into a similar multiple select dialog that, but that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.
Click Next.
4. Staging Configuration
Page 4 of the wizard requires users to specify details for data staging.
Property | Type | Description |
---|---|---|
Cloud Storage Area | drop-down | Select the Google Cloud Storage bucket to stage the data. |
Staging Table Prefix | string | Specify a prefix to be added to all tables that are staged. |
Staging Project | drop-down | Select the staging project. |
Staging Dataset | drop-down | Select the staging dataset. |
Click Next.
5. Target Configuration
Page 5 of the wizard requires users to specify target data warehouse details.
Property | Type | Description |
---|---|---|
Target Table Prefix | string | Specify a prefix to be added to all tables in the load. |
Target Project | drop-down | Select the target BigQuery Project. |
Target Dataset | drop-down | Select the target dataset. |
Concurrency | drop-down | Select whether to load data in a concurrent or sequential method. |
Click Create & Run to finish the setup; or else click Back to cycle back through the wizard.
Completion
Upon completion of the wizard, a Salesforce Incremental Load component will be present on the canvas. This component isn't identical to the Salesforce Query component, but it does utilise the same ideas to perform an incremental load from Salesforce.
To make changes to the component, simply click it, and click into any of the component's properties—exactly as you would with any other Matillion ETL component. Users should avoid making changes to the component during a running workflow.
Enable schema drift
Schema drift support accommodates changes made to the source data, such as:
- Missing columns as a result of changes in the Salesforce schema. Missing columns are loaded as NULL.
- Data type changes for specified columns in the shared job configuration. For further information, see below.
- Tables no longer present in the Salesforce schema. Any missing tables will no longer be loaded. However, your shared job will fail. All other tables specified as part of the configuration in the shared job will be loaded. If this scenario occurs, edit your shared job Table and Columns grid variable to remove the missing table.
- Manual addition of new tables or columns via the Table and Columns grid variable within the existing shared job configuration. If a new table or column is added to your source, it is not added to the shared job configuration as a default behavior. However, any new tables or columns can be added manually.
Data Type changes will also be accommodated, but if these are not compatible changes for the target cloud platform, the current column will be renamed as <column_name>_datetime
and the column re-purposed as the new data type. The format of the datetime extension is _yyyymmddhhmmss
( e.g._20210113110334
) and will be the same for all columns in the same table in the same shared job configuration. The new column will be NULL up to the date of change—this should be considered for downstream dependencies such as views and reports.
Upon completion of the wizard, view the list of the component's properties, and click the configure icon next to Automatically Update Target Metadata. Delete "No" and replace it by typing "Yes" into the text field provided. Then, click OK to save the change, and enable schema drift support.
Deny listed tables
The following tables have been deny listed:
- AccountHistory
- Announcement
- ContactHistory
- ContentDocumentLink
- ContentFolderItem
- ContentFolderMember
- ContentHubItem
- DataStatistics
- DatacloudAddress
- EntityParticle
- FeedAttachment
- FeedItem
- FeedRevision
- FieldDefinition
- FlexQueueItem
- LeadHistory
- LeadShare
- ListViewChartInstance"
- Note
- OutgoingEmail
- OutgoingEmailRelation
- OwnerChangeOptionInfo
- PicklistValueInfo
- PlatformAction
- RelationshipDomain
- RelationshipInfo
- SearchLayout
- SiteDetail
- TaskFeed
- TaskRelation
- UserEntityAccess
- UserFieldAccess
- Vote