Skip to content

Migrating from Matillion ETL - process

This article is aimed at current users of Matillion ETL who wish to migrate their workloads to Matillon's Data Productivity Cloud.

Before beginning any migration project, read Migration considerations.

The migration process copies your Matillion ETL jobs as Data Poductivity Cloud pipelines. It won't create any other required Data Productivity Cloud elements such as projects or environments, so you must manually create these items before beginning the migration. The full list of things you must create and configure in advance of migration is given in Prerequisites, below.


Prerequisites

Hub account

If you do not have a Hub account, you must create one in order to use Data Productivity Cloud. Read Registering for a Hub account for details.

Logging into Hub as an admin user, you should see the Design data pipelines tile. If you do not see it, it may not yet have been enabled. Contact your Matillion Account Manager for assistance.

Note

Please contact your Matillion Account Manager to discuss potential billing changes when you begin using the Data Productivity Cloud to run pipelines.

Users

Any members of your organization who will be using Data Productivity Cloud must be added as users to the Hub. Read Manage account users for details.

Projects

Create a project to create the migrated pipelines, as described in Projects. Later you can add additional projects if you want to segregate your pipelines.

Environments

Create an environment as described in Environments.

Branches

Your project will have a main branch created by default, but good practice is to perform all developement work in a different branch, and only merge the work into main when ready for production. Therefore, we recommend you create a new branch with some appropriate name, such as metl-migrations, to hold the migrated pipeleines. Read Branches for details.

Credentials

For security reasons, we do not migrate credentials such as secrets, passwords, or OAuths from Matillion ETL to the Data Productivity Cloud. Any secrets or other credentials you have set up in Matillion ETL will have to be recreated manually in Data Productivity Cloud to allow your pipelines to run. Read Secret definitions, Cloud provider credentials, and OAuth for details.


Export from Matillion ETL

Migration is a two-stage process. The first step is performed from within Matillion ETL, and involves exporting jobs from there by following the process given in Exporting.

Exported job information is saved in a JSON format file, in whichever default download location is used by your browser and operating system. This JSON file will be used by the import function of Data Productivity Cloud.


Import to Data Productivity Cloud

Note

If you are an existing Data Productivity Cloud user, we strongly recommend you create a new, dedicated project and branch in which to perform the migration, in order to to avoid interfering with your existing projects. If you try to import a job into a branch that has an existing pipeline of the same name, the import will overwrite the existing pipeline, and imported variables will also overwrite existing variables of the same same.

Remember that you can copy pipelines between projects later, after you have completed any required refactoring and verified that they will work.

  1. Open your Data Productivity Cloud project and branch.
  2. In the Pipelines tab, click Add then click Import Jobs (METL).
  3. In the Import Matillion ETL jobs dialog, click where indicated and browse to locate the JSON file you exported from Matillion ETL. Alternatively, drag-and-drop the file from your filesystem onto the dialog.
  4. Before completing the import, the migration tool will analyse the imported job and produce a report of its eligibility for use as a Data Productivity Cloud pipeline. The results are presented under the following headings, and can be viewed in the import dialog or exported to a report:

    • Jobs to be converted: This lists every job in the export file, showing their type and the folders which they will be imported to. The import will create any necessary folders and sub-folders to mirror the project folder structure in Matillion ETL. You can expand the jobs to see the job variables they contain, and should address any that are showing as not converted.
    • Jobs that cannot be converted: This will typically be jobs with components that have no equivalent component in Data Productivity Cloud. To migrate these jobs, you will have to refactor the job to operate without these components before exporting them from Matillion ETL. The exact method of doing this will depend entirely on the design and function of the job.
    • Project variables to be be converted: This lists every variable exported from Matillion ETL that is compatible with the Data Productivity Cloud.
    • Project variables that cannot be converted: This lists every variable that isn't compatible with the Data Productivity Cloud and will not be migrated. These are typically variables that are defined as shared, or which have the STRUCT format. The import can still happen, but without these variables being included. This requires careful consideration, as missing variables will almost certainly cause pipelines to fail, and mitigating measures will need to be taken, either in the Matillion ETL job before exporting, or in the Data Productivity Cloud pipeline after importing.
  5. If satisfied with the eligibility report, click Import to complete the import process.

Regardless of the result of the job eligibility report, it is critical that you review all pipelines after importing, to ensure that they behave as expected and are suitable for production use.


After import

After import, there are a few additional manual changes you may need to make before the pipeline is production ready.

  • You will need to specify a path to child jobs in Run Orchestration and Run Transformation components.
  • You will need to manually add any component-level credentials from Matillion ETL as secrets.
  • You will need to manually specify environments to be used.
  • For the variable types the Data Productivity Cloud doesn't support, you will need to test the new behavior, for example ensuring that your dates still work in expressions when they have been converted to strings.
  • Unsupported components will be identified in the pipeline as "unknown component", and will need to be replaced with alternatives, or the pipeline will need to be refactored to make them unnecessary.