Migrating from Matillion ETL - process

This article is aimed at current users of Matillion ETL who wish to migrate their workloads to Matillon's Data Productivity Cloud.

Before beginning any migration project, read Migration considerations.

Migration of a job is a three-step process:

Export from Matillion ETL.
Import to the Data Productivity Cloud.
Test the migrated pipeline.

Migrating shared jobs from Matillion ETL follows the same processes outlined here, but requires some additional considerations, detailed in Migration: Shared jobs.

Prerequisites

The automated migration tool recreates your Matillion ETL jobs as Data Productivity Cloud pipelines. It won't create any other required Data Productivity Cloud elements such as projects or environments, so you must manually create these items before beginning the migration.

The full list of things you must create and configure in advance of migration is provided in Migrating from Matillion ETL - prerequisites.

If you are an existing Data Productivity Cloud user, we strongly recommend you create a new, dedicated project and branch in which to perform the migration, to avoid interfering with your existing projects. If you try to import a job into a branch that has an existing pipeline of the same name, the import will overwrite the existing pipeline, and imported variables will also overwrite existing variables of the same name. You can copy pipelines between projects later, after you have completed any required refactoring and verified that they will work.

Specific considerations

Some components and features need specific treatment, mitigation, or workarounds when migrated. If you use any of the following features, make sure you understand what specific treatment each one will require.

API trigger

API triggers are supported in the Data Productivity Cloud, but the API works differently. Read the API documentation for further details.

You may need to update your trigger scripts to work with the Data Productivity Cloud API. Evaluate your triggers on a case-by-case basis and update where needed, guided by the API documentation.

Git

Unlike Matillion ETL where Git integration is an optional feature, the Data Productivity Cloud is built with Git as an integral element, providing pipeline version control and making it simple to collaborate and manage data pipelines within your team. Read Git in Designer to learn more about this feature of the Data Productivity Cloud.

If you currently use Git in Matillion ETL, we don't recommend that you use the same Git repository for the Data Productivity Cloud. The Data Productivity Cloud won't recognize the format of Matillion ETL files stored in Git. Although it is possible to connect to the same repository, you won't be able to access your previous Git history. The process of migrating to Git in the Data Productivity Cloud should be:

Create a Data Productivity Cloud project using the Connect your own Git repository option.
Connect the project to a new Git repository with your preferred provider.
Migrate jobs that use Git.
Perform any necessary manual changes to the imported pipelines.
Commit and push the migrated pipelines to the Data Productivity Cloud Git repository.

If you don't currently use the optional Git feature within Matillion ETL, you simply need to select which type of Git repository you want to use in the Data Productivity Cloud and configure it prior to migrating your jobs.

OAuths

For security reasons, we do not migrate credentials such as OAuths from Matillion ETL to the Data Productivity Cloud. Any OAuths you have set up in Matillion ETL will have to be recreated manually in the Data Productivity Cloud to allow your pipelines to run. Read OAuth for details.

Secrets

For security reasons, we do not migrate credentials such as secrets and passwords from Matillion ETL to the Data Productivity Cloud. Any secrets or other credentials you have set up in Matillion ETL will have to be recreated manually in the Data Productivity Cloud to allow your pipelines to run. Read Secrets and secret definitions and Cloud provider credentials for details.

Passwords can't be entered directly into Data Productivity Cloud components. This is by design, to enforce security. All passwords must be stored in secrets, which the component references. Secrets are stored in the Data Productivity Cloud secret manager in a Full SaaS environment, or in your own cloud platform's secret manager in a Hybrid SaaS environment.

Create secrets with the credentials that your pipelines will need to connect to third-party services. Read Secrets and secret definitions and Cloud provider credentials for details.
Update components to point to the secrets you have created.

Webhook and queue triggers for pipelines

The Data Productivity Cloud doesn't natively support triggering pipelines from a webhook or a queue such as SQS or Azure Queue. However, the Data Productivity Cloud architecture shouldn't suffer from some of the internal queuing, scaling, or availability limitations that can make creating a queuing solution necessary for Matillion ETL, making triggering from a webhook or queue unnecessary in most scenarios.

We recommend using the Data Productivity Cloud API for running pipelines directly.

If you need to integrate the Data Productivity Cloud with an existing system based on webhooks or queues (such as triggering a pipeline when a file lands in an S3 bucket), we recommend using AWS Lambda or Azure Functions to implement an API call based on an event.

Variables

Read the following articles to understand how variables will be migrated to the Data Productivity cloud:

Components

Some components need specific treatment when migrated. The following articles describe this in detail:

Export from Matillion ETL

Migration is a two-stage process. The first step is performed from within Matillion ETL, and involves exporting jobs from there by following the process given in Exporting. When you export a job, you should also export any project variables it uses (which will become environment variables in the Data Productivity Cloud).

Exported job information is saved in a JSON format file, in whichever default download location is used by your browser and operating system. This JSON file will be used by the import function of the Data Productivity Cloud.

You can select which jobs you want to export—it doesn't have to be an entire project.

Note

Imports to the Data Productivity Cloud are currently limited to a 100 MB file size, so you must ensure the JSON file exported from Matillion ETL is smaller than this. If you need to migrate more, split your export down into multiple smaller units.

Import to the Data Productivity Cloud

Open your Data Productivity Cloud project and branch.
In the Files panel, click the three dots ... next to the folder you want to import to, or click Add at the top, then click Import from the drop-down.
In the file navigator, browse to the JSON file you exported from Matillion ETL and click Open.
Before completing the import, the migration tool will analyze the Matillion ETL export and produce a report of its compatibility for use as a Data Productivity Cloud pipeline. The results are presented under the following headings in the Importing files panel:
- Jobs tab:
  - Converted without changes: These are the jobs that can be imported without any changes.
  - Auto-converted: These are jobs that will be converted automatically into a form suitable for importing, using acceptable substitute parameters.
  - Manual refactor: This will typically be jobs with components that have no equivalent component in the Data Productivity Cloud. These jobs will be imported, but you will have to manually refactor the pipelines before they will work. The exact method of doing this will depend entirely on the design and function of the workload, taking account of the specific considerations listed in Migration in detail.
  - Unable to convert: These are jobs that can't be converted. You may need to refactor the job in Matillion ETL before exporting, or accept that the workload is not suitable for migrating at this time.
- Project variables tab:
  - Converted without changes: This lists every environment variable exported from Matillion ETL that is compatible with the Data Productivity Cloud and will be imported as a project variable without changes.
  - Auto-converted: These are environment variables that will be converted automatically into a form suitable for importing as project variables, such as converting DATETIME types to STRING.
  - Manual refactor: These variables will be imported, but you will have to manually refactor the variables before they will work. The exact method of doing this will depend entirely on the function of the variable and the type of incompatibility.
  - Unable to convert: These are variables that can't be converted. You will need to refactor any pipelines that used this variable so they will work in some alternate way.
To view a detailed version of this report, click View full report. The report will open in a new browser tab, and can be saved or printed from there as required.

The detailed report gives information down to the component level, and explains any auto-conversion actions that have been undertaken, or specific manual refactoring that is needed to make the component validate. Examples of these exceptions might include:
- Component will be imported as an "Unrecognized Component".
- Python version will be converted to Python 3.
- Secret references will need to be recreated and the components updated to use them.
An extract of a full report illustrating these exceptions is shown below:

To save a copy of the report, we recommend using your browser's print to PDF feature. You should change the print settings to landscape orientation, and you may also wish to adjust margins and scaling options to get the best results. The specifics of this will depend on your browser.
If satisfied with the eligibility report, click Import to complete the import process.
Regardless of the result of the migration report, review and fully test any imported pipelines to ensure that they behave as expected and are suitable for production use.
After the import process is complete, click the name of your branch on the project bar, then click Commit changes. The Commit changes dialog will open.
Add a brief description of the content you've imported to the Data Productivity Cloud.
Click Commit to save these changes.

Note

We recommend committing your changes as soon as you have imported your work into the Data Productivity Cloud. If you begin making changes to your imported pipelines immediately, your first commit will be very large, which is not best practice.

After import

After import, there are some more manual changes you might need to make before the pipeline is production ready.

You will need to manually add any component-level credentials from Matillion ETL as secrets.
You will need to manually specify environments to be used.
For the variable types the Data Productivity Cloud doesn't support, you will need to test the new behavior, for example ensuring that your dates still work in expressions when they have been converted to strings.
The majority of components will import without issue, but a small number of Matillion ETL components are unsupported in the Data Productivity Cloud. These will be identified in the pipeline as "Unknown Component", and will need to be replaced with alternatives, or the pipeline will need to be refactored to make them unnecessary.

Always review and test every migrated pipeline, to ensure that it behaves as expected and is suitable for production use.