Migrating from Matillion ETL - process
Public preview
This article is aimed at current users of Matillion ETL who wish to migrate their workloads to Matillon's Data Productivity Cloud.
Before beginning any migration project, read Migration considerations.
The migration process copies your Matillion ETL jobs as Data Productivity Cloud pipelines. It won't create any other required Data Productivity Cloud elements such as projects or environments, so you must manually create these items before beginning the migration. The full list of things you must create and configure in advance of migration is given in Prerequisites, below.
Migrating shared jobs from Matillion ETL follows the same processes outlined here, but requires some additional considerations, which are detailed below.
Prerequisites
Hub account
If you do not have a Hub account, you must create one in order to use the Data Productivity Cloud. Read Registering for a Hub account for details.
Logging into Hub as an admin user, you should see the Design data pipelines tile. If you do not see it, it may not yet have been enabled. Contact your Matillion Account Manager for assistance.
Note
Contact your Matillion Account Manager to discuss potential billing changes when you begin using the Data Productivity Cloud to run pipelines.
Users
Any members of your organization who will be using the Data Productivity Cloud must be added as users to the Hub. Read Manage account users for details.
Projects
Create a project where you want to import your migrated pipelines, as described in Projects. Later you can add additional projects if you want to segregate your pipelines.
Environments
Create an environment as described in Environments.
Branches
Your project will have a main branch created by default, but good practice is to perform all development work in a different branch, and only merge the work into main when ready for production. Therefore, we recommend you create a new branch with some appropriate name, such as metl-migrations
, to hold the migrated pipeleines. Read Branches for details.
Credentials
For security reasons, we do not migrate credentials such as secrets, passwords, or OAuths from Matillion ETL to the Data Productivity Cloud. Any secrets or other credentials you have set up in Matillion ETL will have to be recreated manually in the Data Productivity Cloud to allow your pipelines to run. Read Secret definitions, Cloud provider credentials, and OAuth for details.
Export from Matillion ETL
Migration is a two-stage process. The first step is performed from within Matillion ETL, and involves exporting jobs from there by following the process given in Exporting.
Exported job information is saved in a JSON format file, in whichever default download location is used by your browser and operating system. This JSON file will be used by the import function of the Data Productivity Cloud.
You can select which jobs you want to export; it doesn't have to be an entire project. You can also export project variables (which will become environment variables in the Data Productivity Cloud).
Note
The maximum size of a JSON export file that the Data Productivity Cloud can import is 10 MB. If your export file is larger than this, you should split the export into several smaller export operations, for example by exporting individual jobs instead of an entire project group.
Import to the Data Productivity Cloud
Note
If you are an existing Data Productivity Cloud user, we strongly recommend you create a new, dedicated project and branch in which to perform the migration, in order to to avoid interfering with your existing projects. If you try to import a job into a branch that has an existing pipeline of the same name, the import will overwrite the existing pipeline, and imported variables will also overwrite existing variables of the same same.
Remember that you can copy pipelines between projects later, after you have completed any required refactoring and verified that they will work.
- Open your Data Productivity Cloud project and branch.
- In the Pipelines tab, click ... next to the root folder, then click Import.
- In the file navigator, browse to the JSON file you exported from Matillion ETL and click Open.
-
Before completing the import, the migration tool will analyse the Matillion ETL export and produce a report of its compatibility for use as a Data Productivity Cloud pipeline. The results are presented under the following headings, and can be viewed in the import dialog or exported to a report:
- Jobs to be converted without changes: These are the jobs that can be imported without any changes.
- Jobs to be auto-converted: These are jobs that will be converted automatically into a form suitable for importing, using acceptable substitute parameters.
- Jobs to be manually refactored: This will typically be jobs with components that have no equivalent component in the Data Productivity Cloud. These jobs will be imported, but you will have to manually refactor the jobs before they will work. The exact method of doing this will depend entirely on the design and function of the job.
- Project variables to be be converted: This lists every environment variable exported from Matillion ETL that is compatible with the Data Productivity Cloud and will be imported as a project variable.
- Project variables to be auto-converted: These are environment variables that will be converted automatically into a form suitable for importing as project variables.
- Project variables to be manually refactored: These variables will be imported, but you will have to manually refactor the variables before they will work. The exact method of doing this will depend entirely on the function of the variable and the type of incompatibility.
- Project variables that will not be imported: These variables aren't compatible with the Data Productivity Cloud and won't be migrated. The import can still happen, but without these variables being included. This requires careful consideration, as missing variables will almost certainly cause pipelines to fail, and mitigating measures will need to be taken, either in the Matillion ETL job before exporting, or in the Data Productivity Cloud pipeline after importing.
-
If satisfied with the eligibility report, click Import to complete the import process.
Note
Regardless of the result of the job eligibility report, it's critical that you review all pipelines after importing, to ensure that they behave as expected and are suitable for production use.
-
After the import process is complete, click Commit in the branch menu on the project bar. The Commit changes dialog will open.
- Add a brief description of the content you've imported to the Data Productivity Cloud.
- Click Commit to save these changes.
Note
We recommend committing your changes as soon as you have imported your work into the Data Productivity Cloud. If you begin making changes to your imported pipelines immediately, your first commit will be very large, which is not best practice.
Save report to PDF
To save the report to a PDF document, follow these steps. The exact options and settings may differ slightly depending on your browser.
- Click View full report in the Importing files dialog. This will open a new tab displaying the report.
- Right-click on the page and select Print, or press
Command+P
(Mac) orCTRL+P
(Windows). This will open the print settings dialog. -
Select the following settings for best results:
- Destination: Save as PDF.
- Layout: Landscape.
- Margins: Minimum.
- Scale: Select Customized and set the value to 80 or lower.
- Options: Select Headers and footers.
-
Click Save.
- Enter your preferred filename and select a target location.
- Click Save.
After import
After import, there are a few additional manual changes you may need to make before the pipeline is production ready.
- You will need to manually add any component-level credentials from Matillion ETL as secrets.
- You will need to manually specify environments to be used.
- For the variable types the Data Productivity Cloud doesn't support, you will need to test the new behavior, for example ensuring that your dates still work in expressions when they have been converted to strings.
- Unsupported components will be identified in the pipeline as "unknown component", and will need to be replaced with alternatives, or the pipeline will need to be refactored to make them unnecessary.
Migrating shared jobs
There are some additional factors to consider when migrating a Matillion ETL shared job to a Data Productivity Cloud shared pipeline. To correctly migrate shared jobs, use the process given below.
Before doing this, ensure that you fully understand the concepts and use of both shared jobs in Matillion ETL and shared pipelines in the Data Productivity Cloud.
Best practice for shared pipelines is to create them in their own dedicated project, separate from the projects which consume them. These instructions assume you will be doing that.
-
In Matillion ETL, unpack the shared jobs you want to export.
Note
If you have the original source of the shared jobs, you can skip this step and export the source instead.
-
Export the unpacked jobs as described above.
- In the Data Productivity Cloud, import the shared jobs, as described above. Ensure that you are importing into the project you are using to hold your shared pipelines.
- Refactor, test, and amend these pipelines as needed to ensure they perform the expected function in the Data Productivity Cloud.
- Share the pipelines, as described in Sharing a pipeline.
- In Matillion ETL, export the jobs that use the shared jobs.
- In the Data Productivity Cloud, import the exported jobs. These will become your consuming pipelines in the Data Productivity Cloud.
- Create a mapping to resolve any issues in the import that require refactoring. Read Migration mappings to learn how to do this.
- Refactor and test the imported pipeline to ensure it functions as expected and correctly calls the shared pipelines it needs.