Skip to content

Migration considerations

This article is aimed at current users of Matillion ETL who wish to migrate their workloads to Matillon's Data Productivity Cloud.

There are two different paths to migration:

  • Self Serve, which is detailed in this article.
  • Assisted, where you are supported by a Matillion partner. Contact your Matillion Account Manager to discuss support for your migration project.

The process of migrating your workloads is given in Migrating from Matillion ETL - process. However, we strongly recommend that you read and understand all of the considerations discussed in this article before attempting any migration project. This article will outline what you should do to get ready for your migration, and what best practices to follow during the migration.

You should become familiar with the following terminology differences and bear them in mind when reading Data Productivity Cloud documentation.

Matillion ETL Data Productivity Cloud
Job Pipeline
Job variable Pipeline variable
Environment variable Project variable

What do you need to consider before migrating?

The Matillion Data Productivity Cloud and Matillion ETL are different products, and although you will see similarities in look and feel, they have different architectures and different billing models. As a Matillion ETL user, it is important to understand the principles of Data Productivity Cloud in order to make an informed decision about migrating. In particular, you need to be aware of:

You should also be aware that there are features differences between Matillion ETL and Data Productivity Cloud. If your workloads rely on Matillion ETL features that are currently not present in Data Productivity Cloud, this may not the right time to migrate.

It is important to realise that you can start using the Matillion Data Productivity Cloud in parallel with Matillion ETL, allowing you to test and evalute features, and even to build and run new production pipelines, without migrating anything until you are ready. You also don't have to migrate every workload as a single operation. You can migrate a single workload, and get that into production in Data Productivity Cloud before moving onto the next.


Plan your migration

Once you have determined that migration is right for you at this time, and you have read and understood all the implications outlined in this document, we recommend taking the following structured approach to the migration.

  1. Ensure you've set up a Data Productivity Cloud project and all required credentials. These steps are described in detail below.
  2. Perform a migration dry run:

    1. See what will migrate successfully.
    2. See what will be changed by the migration.
    3. See what can't be automatically migrated and may need to be refactored.
  3. Migrate the pipelines into your Data Productivity Cloud project.

  4. Refactor pipelines:

    • Where you have identified that changes are needed to make the pipeline run. Components not supported in Data Productivity Cloud will show as "unknown component" in the pipeline, and will need to be replaced with alternatives. Components that have migrated may still need modifying before they will validate.
    • Where you have identified that a different pipeline design will take better advantage of Data Productivity Cloud's scalability.
    • Where a component includes elements that are not migrated, such as passwords and credentials.
  5. Test all migrated pipelines before attempting any production runs.

  6. Schedule production runs when you are certain that the pipeline will behave as expected.
  7. Remove the pipeline from your Matillion ETL schedules.

Remember that you don't have to migrate all workloads in a single operation. We recommend that you migrate incrementally and ensure one workload works before migrating the next, continuing to run Matillion ETL for workloads that haven't yet been migrated.


Steps to take before migration

Create a Hub account

As a Matillion ETL user, you may have a Hub account already. If you do not, you must create one in order to use Data Productivity Cloud. Read Registering for a Hub account for details.

Note

Please contact your Matillion Account Manager to discuss potential billing changes when you begin using the Data Productivity Cloud to run pipelines.

Create Hub users

Any members of your organization who will be using Data Productivity Cloud must be added as users to the Hub. Read Manage account users for details.

Plan your project

Data Productivity Cloud uses projects to logically group and separate workloads. You will be familiar with the concept of projects from Matillion ETL, and you may wish to create a set of projects that mirror your Matillion ETL project structure, but Data Productivity Cloud projects have different configuration options than Matillion ETL projects, so it is important to read Projects in order to understand the differences.

Different projects can have different users, permissions, and credentials associated with them, so you may want to take some time to plan this.

You need to consider the following choices when planning up your first Data Productivity Cloud project.

Full SaaS or Hybrid SaaS

Read Matillion Full SaaS vs Hybrid SaaS for an explanation of the differences.

We recommend that you begin with a Full SaaS project, as this is the fastest and easiest way to get started with learning the Data Productivty Cloud and getting your inital pipelines running. You can create a Hybrid SaaS project later, and run both types of project simultaneously if required.

For a Hybrid SaaS project, you will be installing our agent software within your own cloud infrastructure and data plane. This is may be beneficial if:

  • Your data locality requirements need the agent to run in a specific region that Matillion doesn't currently provide agents in.
  • Your use case requires proximity between the data processing agent and your source systems.
  • You need to access systems (such as database or file storage) that only have network access from within your VPC/VNet.

Read Create an agent in your infrastructure for details of how to set up the agent needed for a Hybrid SaaS project.

Matillion hosted Git or bring your own Git

Data Productivity Cloud uses Git version control to keep track of changes made to projects and allow you to collaborate without overwriting work. Read Git in Designer to learn more.

For each project, you must decide whether you will use Data Productivity Cloud's own hosted Git repository, or whether you will connect to an external Git repository you control.

  • Matillion-hosted Git: Matillion sets up and manages a Git repository on your behalf, so you have no Git system to manage. This is the easiest option to configure and manage, but you can't get direct access to the repository from other tools. If you don't already have your own Git repository, you can use this option for trials.
  • Connect your own Git repository: Connecting a GitHub repository to the Data Productivity Cloud is a multi-step process with certain prerequisite actions. This assumes you are already familiar with the use of Git repositories.

For long-term production use, we recommend that you use your own Git provider, as the capabilities it can support will be much greater than those provided by Matillion-hosted Git.

Create the project

Create a single project as described in Projects. Later you can add additional projects if you want to segregate your pipelines.

Create an environment

An environment defines the connection between a project and your chosen cloud data warehouse. Environments include default configuration, such as a default warehouse, database, and schema, that can be used to pre-populate component configurations in your pipelines. Read Environments for details.

Create credentials

Create any secrets, passwords, or OAuths that your pipelines will need to connect to third-party services. Read Secret definitions, Cloud provider credentials, and OAuth for details.

Create branches

Data Productivity Cloud is designed around the concept of branches to provide version control and collaboration features. This is similar to the optional Git integration feature in Matillion ETL, but in Data Productivity Cloud it is not optional and all pipelines must be assigned to Git branches. There are also differences between how Git is implemented in Matillion ETL and Data Productivity Cloud, so regardless of whether you currently use Git in Matillion ETL or not, ensure you have read and understood Git in Designer. You should then create your first Branches to contain your migrated pipelines.

Your project will have a main branch created by default, but good practice is to perform all developement work in a different branch, and only merge the work into main when ready for production.