Skip to content

Migration feature parity

Public preview

As a Matillion ETL user, a lot of the look and feel of the Data Productivity Cloud will be familiar to you, but you shouldn't assume that it will always work in exactly the same way. There are some fundamental differences between the two platforms, and you should have a thorough understanding of these differences before you begin a migration, as some things may require manual effort or even refactoring.

This article lists features in Matillion ETL that are not currently supported in the Data Productivity Cloud. For a full discussion of this topic, read Migration considerations.


Not currently available in the Data Productivity Cloud

The following features aren't currently supported, or are only partially supported, but are intended to be implemented in the future. If you rely on any of these features, you may want to delay migration until we announce their availability. In some cases, workarounds have been identified which will allow your pipelines to continue to work without these features.

Category Matillion ETL feature Status and workaround
Components General component parity Most components will be implemented in a rolling program, except as noted below.
Components Database transactions (BEGIN, COMMIT, ROLLBACK) These will be implemented in the future. No current workaround.
Components Temporary tables These will be implemented in the future. As a workaround, refactor the pipleline to use permanent tables.
Components DBT This is partially available but works differently in Data Productivity Cloud. Some pipeline refactoring will be needed.
Components External tables These will be implemented in the future. As a workaround, reactor the pipeline to create the tables using SQL.
Components Unload components (reverse ETL) These will be implemented in the future.
Scripting Python (Snowflake) Available but with limitations. Use Python Pushdown in preference.
Scripting Python (Redshift and Databricks) Currently available, with limitations, only in Hybrid SaaS solutions. Python Pushdown will be implemented in the future.
Pipeline set-up Webhook and Queue Triggers for pipelines These will be implemented in the future. As a workaround, use Lambda or the equivalent to call the API.
Pipeline set-up Pass variables into an API trigger Not currently implemented.
Pipeline set-up Shared jobs These will be implemented in the future. As a workaround, migrate as non-shared pipelines.
Staging/Ingestion User-defined stages (Snowflake) These will be implemented in the future.
Staging/Ingestion Upload custom JDBC drivers These will be implemented in the future.
Staging/Ingestion Database Query Database Query is implemented, but not all options are available. Missing options are: IBM DB2, Netezza, SAPHana, Teradata.
Staging/Ingestion Data Transfer Data Transfer is implemented, but not all data are available. Missing options are: Box, Dropbox, HDFS, HTTP, Microsoft Exchange, Google Cloud Storage.
Staging/Ingestion OAuth authentication for connectors Some OAuths are implemented.
Variables Shared variables All variables will have Copied behavior. Shared behaviour will be implemented in the future.
Variables Automatic variables These will be implemented in the future.
Variables Export variables These will be implemented in the future.
Variables STRUCT data type These will be implemented in the future.
Variables DATETIME variables These will be implemented in the future. Currently migrated as STRING, so pipelines may need refactoring for this.

Required refactoring due to Data Productivity Cloud improvement

We've made changes to key areas in the Data Productivity cloud, so you'll need to refactor any pipelines using these features or components.

Category Matillion ETL feature Workaround
Components Replicate Not needed, as all components have multiple outputs by default. In the migrated pipeline, remove the "Unknown Component" and connect its input directly to all of the downstream components.
Staging/Ingestion API Extract Replaced with Custom connectors.
Staging/Ingestion API Query Replaced with Custom connectors.
Staging/Ingestion Database driver for MySQL Data Productivity Cloud uses the MariaDB driver for accessing MySQL. Before migrating, test that the MariaDB driver works in your use case.
Components Deprecated components Replace the deprecated component with the newer component in the Matillion ETL job before migration, or migrate and then replace the "Unknown Component" in the Data Productivity Cloud pipeline.
Scripting Bash Not supported in Full SaaS solutions. Currently available, with limitations, in Hybrid SaaS solutions.
Scripting Jython Use Python Pushdown instead.
Variables Stacked iterators Refactor to split into sub-pipelines, so each has only one iterator.

Detailed discussion of differences

Variables

Your job variables, environment variables, and grid variables will be migrated. Environment variables become project variables in Data Productivity Cloud, and job variables become pipeline variables. Grid variables become a type of pipeline variable. Read Variables and Grid variables for details.

There are some key differences to note:

  • Data Productivity Cloud doesn't currently allow you to specify whether a variable is copied or shared. All variables in Data Productivity Cloud are assumed to be "copied". The migration tool will automatically convert all SHARED variables to COPIED. Support for shared variables will be added in the future.
  • Date formats for variables aren't currently supported in Data Productivity Cloud. The migration tool will automatically convert all DATETIME variables to STRING. Support for date formats will be added in the future.
  • STRUCT formats for variables aren't currently supported in the Data Productivity Cloud. Support for STRUCT formats will be added in the future.
  • Scalar and grid pipeline variables can't share the same name, as a grid is now a type of pipeline variable, not a separate class of variable.

The following Matillion ETL components for manipulating grid variables are not available yet in Data Productivity Cloud but are coming soon:

  • Append to grid
  • Table metadata to grid
  • JDBC table metadata to grid
  • Remove from grid

This won't affect the migration of grid variables, but jobs that use these components will have to be refactored to work without them.

API Query and API Extract

Data Productivity Cloud does not have API Query or API Extract components, and doesn't use the concept of query profiles or extract profiles. Instead, you can create custom connector components, which can be configured to connect to any compatible third-party API.

If you have a job that uses API Query or API Extract, it can't be migrated to Data Productivity Cloud and used unchanged. You will need to create custom connectors that replicate the functionality of your query or extract components, and edit the pipelines to use these new connectors.

Replicate

In Matillion ETL, you needed to use a Replicate component to send the output of a component to several other components. In Data Productivity Cloud, any component can direct an output to multiple other components, so you should edit your pipelines to remove any Replicate components and replace them with basic output connections.

Automatic variables

Automatic variables aren't currently supported in Data Productivity Cloud. If you have jobs that rely on automatic variables, you may have to refactor the migrated pipeline to work without them, or accept that the job isn't suitable for migration at this time.

Shared jobs

Shared jobs aren't currently supported in Data Productivity Cloud. We recommend unpacking shared jobs and migrating them as standard jobs until the feature is supported, at which point you can repackage them.

Schedules

Schedules won't be migrated. Any required schedules will have to be recreated in Data Productivity Cloud after the jobs are migrated, tested, and approved for production work. Read Schedules to learn how to create Data Productivity Cloud schedules.

Python

Although a Python Script component is available in Hybrid SaaS projects, it isn't available in Full SaaS projects, and there are some limitations that apply in Hybrid SaaS:

  • Python scripts can't assume there is a filesystem that will persist after the script completes. Unlike Matillion ETL that runs on a Linux VM with a persistent disk, Data Productivity Cloud agents consist of one or more containers that may stop and start as needed, thus losing the file system contents.
  • Data Productivity Cloud agents have much lower CPU and memory resources than the Matillion ETL VMs; they are not designed for any compute-intensive operations, thus large scripts may run very slowly or cause the agent to become unstable.

Because of this, we do not recommend the use of the Python Script component if you can avoid it. Consider the following alternatives for some common uses of Python scripts:

Python task Alternative component
Print the current value of a variable Print Variabes
Update the value of a variable Update Scalar

In a Snowflake environment, we recommend that you use the Python Pushdown component in preference to the Python Script component. Python Pushdown executes a Python script using the Snowpark service in your Snowflake account, taking advantage of Snowflake's existing scalable and secure Python execution engine to execute the code and return the results to the component. This is the only way to execute Python scripts in a Full SaaS project, but also the recommended way to execute a Python script in a Hybrid SaaS project on Snowflake.

Bash

Although a Bash Script component is available in Hybrid SaaS projects, it isn't available in Full SaaS projects, and there are some limitations that apply in Hybrid SaaS:

  • Bash scripts can't assume there is a filesystem that will persist after the script completes. Unlike Matillion ETL that runs on a Linux VM with a persistent disk, Data Productivity Cloud agents consist of one or more containers that may stop and start as needed, thus losing the file system contents.
  • Data Productivity Cloud agents have much lower CPU and memory resources than the Matillion ETL VMs; they are not designed for any compute-intensive operations, thus large scripts may run very slowly or cause the agent to become unstable.

Because of this, we do not recommend the use of the Bash Script component if you can avoid it. If possible, consider the use of Python Pushdown to perform the Bash scripting tasks with Python.

Passwords and Secrets

It's not possible to save passwords in Data Productivity Cloud components. This is by design, to enforce security. All passwords are stored in secrets, which the component references. Secrets are stored in the Data Productivity Cloud secret manager in a Full-SaaS environment, or in your own cloud platform's secret manager in a Hybrid-SaaS environment.

Read Secret definitions for details.