Migration feature parity
As a Matillion ETL user, a lot of the look and feel of the Data Productivity Cloud will be familiar to you, but you shouldn't assume that it will always work in exactly the same way. There are some fundamental differences between the two platforms, and you should have a thorough understanding of these differences before you begin a migration, as some things may require manual effort or even refactoring.
This article lists features in Matillion ETL that are not currently supported in the Data Productivity Cloud. For a full discussion of this topic, read Migration considerations.
Not currently available in the Data Productivity Cloud
The following features aren't currently supported, or are only partially supported, but are intended to be implemented in the future. If you rely on any of these features, you may want to delay migration until we announce their availability. In some cases, workarounds have been identified which will allow your pipelines to continue to work without these features.
Category | Matillion ETL feature | Status and workaround |
---|---|---|
Components | General component parity | Most components will be implemented in a rolling program, except as noted below. |
Components | Database transactions (BEGIN, COMMIT, ROLLBACK) | These will be implemented in the future. No current workaround. |
Components | Temporary tables | These will be implemented in the future. As a workaround, refactor the pipleline to use permanent tables. |
Components | DBT | This is partially available but works differently in the Data Productivity Cloud. Some pipeline refactoring will be needed. |
Components | External tables | These will be implemented in the future. As a workaround, reactor the pipeline to create the tables using SQL. |
Components | Unload components (reverse ETL) | These will be implemented in the future. |
Scripting | Python (Snowflake) | Available but with limitations. Use Python Pushdown in preference. |
Scripting | Python (Redshift and Databricks) | Currently available, with limitations, only in Hybrid SaaS solutions. Python Pushdown will be implemented in the future. |
Pipeline set-up | Webhook and Queue Triggers for pipelines | These will be implemented in the future. As a workaround, use Lambda or the equivalent to call the API. |
Pipeline set-up | Pass variables into an API trigger | Not currently implemented. |
Staging/Ingestion | User-defined stages (Snowflake) | These will be implemented in the future. |
Staging/Ingestion | Database Query | Database Query is implemented, but not all options are available. Missing options are: IBM DB2, Netezza, SAPHana, Teradata. |
Staging/Ingestion | Data Transfer | Data Transfer is implemented, but not all data are available. Missing options are: Box, Dropbox, HDFS, HTTP, Microsoft Exchange, Google Cloud Storage. |
Staging/Ingestion | OAuth authentication for connectors | Some OAuths are implemented. |
Variables | Automatic variables | These will be implemented in the future. |
Variables | DATETIME variables | These will be implemented in the future. Currently migrated as STRING, so pipelines may need refactoring for this. |
Variables | Export variables | These will be implemented in the future. |
Variables | Setting concurrency to a variable | Currently it is not possible to control the "Concurrency" behavior of an iterator component with a variable, you will need to hard-code this. |
Variables | STRUCT data type | These will be implemented in the future. |
Required refactoring due to Data Productivity Cloud improvement
We've made changes to key areas in the Data Productivity cloud, so you'll need to refactor any pipelines using these features or components.
Category | Matillion ETL feature | Workaround |
---|---|---|
Components | Replicate | Not needed, as all components have multiple outputs by default. In the migrated pipeline, remove the "Unknown Component" and connect its input directly to all of the downstream components. |
Staging/Ingestion | API Extract | Replaced with Custom connectors. |
Staging/Ingestion | API Query | Replaced with Custom connectors. |
Staging/Ingestion | Database driver for MySQL | Data Productivity Cloud uses the MariaDB driver for accessing MySQL. Before migrating, test that the MariaDB driver works in your use case. |
Components | Deprecated components | Replace the deprecated component with the newer component in the Matillion ETL job before migration, or migrate and then replace the "Unknown Component" in the Data Productivity Cloud pipeline. |
Scripting | Bash | Not supported in Full SaaS solutions. Currently available, with limitations, in Hybrid SaaS solutions. |
Scripting | Jython | Use Python Pushdown instead. |
Variables | Stacked iterators | Refactor to split into sub-pipelines, so each has only one iterator. |
Pipeline set-up | Shared jobs | Shared pipelines are supported, but the migration of shared jobs to shared pipelines requires some additional steps, described here. |
Detailed discussion of differences
API Query and API Extract
Data Productivity Cloud does not have API Query or API Extract components, and doesn't use the concept of query profiles or extract profiles. Instead, you can create custom connector components, which can be configured to connect to any compatible third-party API.
If you have a job that uses API Query or API Extract, it can't be migrated to Data Productivity Cloud and used unchanged. You will need to create custom connectors that replicate the functionality of your query or extract components, and edit the pipelines to use these new connectors.
Automatic variables
Automatic variables aren't currently supported in the Data Productivity Cloud. If you have jobs that rely on automatic variables, you may have to refactor the migrated pipeline to work without them, or accept that the job isn't suitable for migration at this time.
Bash
Although a Bash Script component is available in Hybrid SaaS projects, it isn't available in Full SaaS projects, and there are some limitations that apply in Hybrid SaaS:
- Bash scripts can't assume there is a filesystem that will persist after the script completes. Unlike Matillion ETL that runs on a Linux VM with a persistent disk, Data Productivity Cloud agents consist of one or more containers that may stop and start as needed, thus losing the file system contents.
- Data Productivity Cloud agents have much lower CPU and memory resources than the Matillion ETL VMs; they are not designed for any compute-intensive operations, thus large scripts may run very slowly or cause the agent to become unstable.
Because of this, we do not recommend the use of the Bash Script component if you can avoid it. If possible, consider the use of Python Pushdown to perform the Bash scripting tasks with Python.
Passwords and Secrets
It's not possible to save passwords in Data Productivity Cloud components. This is by design, to enforce security. All passwords are stored in secrets, which the component references. Secrets are stored in the Data Productivity Cloud secret manager in a Full-SaaS environment, or in your own cloud platform's secret manager in a Hybrid-SaaS environment.
Read Secret definitions for details.
Python
Although a Python Script component is available in Hybrid SaaS projects, it isn't available in Full SaaS projects, and there are some limitations that apply in Hybrid SaaS:
- Python scripts can't assume there is a filesystem that will persist after the script completes. Unlike Matillion ETL that runs on a Linux VM with a persistent disk, Data Productivity Cloud agents consist of one or more containers that may stop and start as needed, thus losing the file system contents.
- Data Productivity Cloud agents have much lower CPU and memory resources than the Matillion ETL VMs; they are not designed for any compute-intensive operations, thus large scripts may run very slowly or cause the agent to become unstable.
- Only Python 3 is available in the Data Productivity Cloud. The migration tool will warn you that Python 3 is the only option if you have a component that uses the Python 2 or Jython interpreters. You may need to update your Python 2 or Jython scripts to be compatible with Python 3.
Because of these limitations, we don't recommend the use of the Python Script component if you can avoid it. Consider the following alternatives for some common uses of Python scripts:
Python task | Alternative component |
---|---|
Print the current value of a variable | Print Variabes |
Update the value of a variable | Update Scalar |
In a Snowflake environment, we recommend that you use the Python Pushdown component in preference to the Python Script component. Python Pushdown executes a Python script using the Snowpark service in your Snowflake account, taking advantage of Snowflake's existing scalable and secure Python execution engine to execute the code and return the results to the component. This is the only way to execute Python scripts in a Full SaaS project, but also the recommended way to execute a Python script in a Hybrid SaaS project on Snowflake.
Replicate
In Matillion ETL, you needed to use a Replicate component to send the output of a component to several other components. In the Data Productivity Cloud, any component can direct an output to multiple other components, so you should edit your pipelines to remove any Replicate components and replace them with basic output connections.
Schedules
Schedules won't be migrated. Any required schedules will have to be recreated in the Data Productivity Cloud after the jobs are migrated, tested, and approved for production work. Read Schedules to learn how to create Data Productivity Cloud schedules.
Shared jobs
Shared jobs can be migrated to shared pipelines in the Data Productivity Cloud, but need to be migrated in a specific way. Read Migrating shared jobs for details.
S3 Unload for Amazon Redshift
In the S3 Unload component of the Data Productivity Cloud for Amazon Redshift, using the Fixed Width
parameter hides the Escape
and Add quotes
parameters, as these aren't compatible with Fixed Width
. In Matillion ETL for Amazon Redshift, these parameters are accessible in the S3 Unload component but won't pass validation.
An edge case can arise when Fixed Width
is selected while both the Escape
and Add quotes
parameters are set to "Yes". Although this configuration is incompatible, a user may still try to migrate a job containing an S3 Unload component with these settings. The migration tool will block this action.
Text Output
The Text Output component in Matillion ETL is migrated to the S3 Unload component in the Data Productivity Cloud for Amazon Redshift. This is not optional and is required to use the Text Output component in the Data Productivity Cloud. The migration is performed using the following parameter mappings:
Text Output | S3 Unload | Differences |
---|---|---|
Schema | Schema | No change. |
Table Name | Table Name | No change. |
S3 URL Location | S3 URL Location | No change. |
S3 Object Prefix | S3 Object Prefix | No change. |
Delimiter | If the delimiter value is not a comma, the S3 Unload component will set the Data File type to Delimited , and copy the delimiter value (specified in the Text Output Delimiter parameter) into its own delimiter parameter. |
|
Compress Data | Compress Data | No change. |
Null As | NULL As | No change. |
Output Type | Data File Type | When the Output Type parameter of Text Output is set to "Escaped," the Escape parameter in S3 Unload will be set to "Yes." |
Multiple Files | Parallel | Text Output file names will appear differently. S3 Unload will always append the file number on to the end of the prefix. Text Output only appends numbers if there are multiple files and it also appends a _ before the numbers, whereas S3 Unload adds the numbers on to the end. |
Row limit per file | This Text Output parameter can't be migrated. S3 Unload only allows limiting by file size. The S3 Unload component uses the Max File Size parameter. For more information, read UNLOAD, which is slightly different to how the Row limit per file parameter works in the Text Output component. |
|
Include Header | Include Header | No change. |
Null Handling | This Text Output parameter can't be migrated. Null handling isn't supported in the S3 Unload component. | |
Encryption | Encryption | No change. |
KMS Key ID | KMS Key Id | No change. |
When there are parameter differences in the migration tool, a warning will be displayed advising you to "Please refer to documentation for the differences".
Variables
Your job variables, environment variables, and grid variables will be migrated. Environment variables become project variables in the Data Productivity Cloud, and job variables become pipeline variables. Grid variables become a type of pipeline variable. Read Variables and Grid variables for details.
There are some key differences to note:
- Date formats for variables aren't currently supported in the Data Productivity Cloud. The migration tool will automatically convert all DATETIME variables to STRING. Support for date formats will be added in the future.
- STRUCT formats for variables aren't currently supported in the Data Productivity Cloud. Support for STRUCT formats will be added in the future.
- Scalar and grid pipeline variables can't share the same name, as a grid is now a type of pipeline variable, not a separate class of variable.
The following Matillion ETL component for manipulating grid variables isn't available yet in the Data Productivity Cloud but is coming soon:
- Append to grid
This won't affect the migration of grid variables, but jobs that use this component will have to be refactored to work without it.
The Remove from Grid component has been improved in the Data Productivity Cloud so that the Fixed/Grid parameter is no longer needed. This means that when migrating a Remove from Grid component that had Fixed/Grid set to Grid, you will lose your selected grid mappings. To put the mappings back, select the Use Grid Variable option of the Values Grid property and select the grid mappings again. If the component had Fixed/Grid set to Fixed, the migrated component will continue to work with no additional mitigation needed.