Migration: dbt
dbt is supported by the Data Productivity Cloud, so Matillion ETL jobs using Commands for dbt Core can be migrated to Data Productivity Cloud pipelines and continue to run as expected. Some manual intervention in the migrated pipeline will be required to connect it to a Git file repository.
The key difference between dbt jobs in Matillion ETL and dbt pipelines in the Data Productivity Cloud is that Matillion ETL uses a Sync File Source component to fetch the latest commit of dbt files from your Git repository before running dbt core commands, while in the Data Productivity Cloud, file sync is performed as part of the dbt Core component, making an additional Sync File Source component unnecessary.
Migration path
In Matillion ETL, prior to migrating a job that uses the Sync File Source component:
- Examine the properties of the Sync File Source component in the job and note the External File Source property.
- Open Manage External File Sources from the Project menu.
- In the Manage External File Sources dialog, select the external file source you noted above.
- Note down the following property settings for the file source. You will need these setting to configure the connection in the migrated pipeline.
- Remote URL
- Username
- Password
- Branch
After running the migration process:
- Open the migrated pipeline in the Data Productivity Cloud Designer.
- The Sync File Source component will have been migrated as "Unknown Component". The component isn't needed in the Data Productivity Cloud, so you should delete it and reconnect the components either side of it.
- Open the dbt Core properties panel and set the dbt Project Location to External repository.
-
Configure the component with the properties you noted from the Sync File Source component, as follows:
Sync File Source property dbt Core property Remote URL Git URL Username Git Username Password Git Password. Because the Data Productivity Cloud doesn't allow storing of passwords directly in a component, you will need to create a secret definition to store it. Branch Git Branch If you have multiple pipelines using the same configuration, you may want to consider using project variables to set these properties, to make future configuration updates easier to perform.
Note
Sync File Source in Matillion ETL is an optional operation—if you never update the dbt configuration, for example, you would never need to sync with an external Git repository so you would not include the component in your job. In Data Productivity Cloud, however, you must configure a repository in the dbt Core component. If you do not have values to copy over from the Matillion ETL file source, you must create appropriate values following the guidance in the dbt Core documentation.
dbt versions
To avoid potential issues, it's best to ensure that your Matillion ETL instance is running the same dbt version as the Data Productivity Cloud when you migrate.
The Data Productivity Cloud runs the most recent stable version of dbt, and is periodically updated to the newest version. However, Matillion ETL allows you to choose when and if to upgrade dbt, meaning we can't guarantee that your instance will have the latest version when migrating.
Note
You can't manually change the dbt version being used in the Data Productivity Cloud.
Prior to migrating your dbt job, you should therefore update Matillion ETL to the latest dbt version using this command:
python3 -m pip install dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery
Read Installing DBT on Matillion ETL for further details.