Skip to content

Migration considerations

This article is aimed at current users of Matillion ETL who want to migrate their workloads to Matillion's Data Productivity Cloud. It will outline what you should do to get ready for your migration, and what best practices to follow during the migration.

There are two different paths to migration:

  • Self Serve, which is detailed in Migration process.
  • Assisted, where you are supported by Matillion or a Matillion partner. Contact your Matillion Account Manager to discuss support for your migration project.

Migrating workloads from Matillion ETL to the Data Productivity Cloud is designed to be as frictionless as possible, with an automated migration tool that converts Matillion ETL components into their Data Productivity Cloud equivalents. Migration of a job is a three-step process:

  1. Export from Matillion ETL.
  2. Import to the Data Productivity Cloud.
  3. Test the migrated pipeline.

There are some prequisite steps you need to take to set up the Data Productivity Cloud environment before migrating, and some decisions you need to make about how you will use the Data Productivity Cloud, discussed below.


Benefits of migrating to the Data Productivity Cloud

As a user of Matillion ETL, you will find that the Data Productivity Cloud builds on what you already know while unlocking new capabilities to help you work smarter. Key benefits of the Data Productivity Cloud include:

  • The SaaS infrastructure model means that you have no infrastructure to set up, maintain, monitor, or update. It's all done for you, and new customers can be developing pipelines within ten minutes of creating a Data Productivity Cloud account.
  • Pipelines scale virtually infinitely to meet any performance challenge, with an architecture that guarantees scale will never be a roadblock.
  • We bake intelligence into every step of pipeline development, and automate loading, transformation, and orchestration, so your team is focused on value adds, not mundane tasks.

With Data Productivity Cloud, you can:

  • Unlock insights from unstructured data using AI-driven techniques.
  • Boost productivity with AI-powered insights, making data transformation easier for everyone.
  • Eliminate infrastructure headaches.
  • Scale effortlessly without DevOps complexity.
  • Achieve greater cost efficiency by reducing operational overhead and maximizing ROI.

Migration to the Data Productivity Cloud isn't just an upgrade, it's a strategic leap forward that preserves your existing investment while unlocking unprecedented capabilities. It offers unlimited scale across infinite users, projects, and pipelines, and lets you harness the full push-down architecture of your cloud data platform without losing control of costs.


What do you need to consider before migrating?

  • The Data Productivity Cloud supports Snowflake, Databricks, and Amazon Redshift.
  • Snowflake on GCP is only available in the Data Productivity Cloud in a Full SaaS environment at this time.
  • Google Big Query isn't supported in the Data Productivity Cloud at this time.
  • Azure Synapse isn't supported in the Data Productivity Cloud at this time.

As a Matillion ETL user, you'll find that the Data Productivity Cloud has a very familiar look and feel, but there are some small terminology differences you'll need to bear in mind when reading the documentation:

Matillion ETL Data Productivity Cloud
Job Pipeline
Job variable Pipeline variable
Environment variable Project variable

The Data Productivity Cloud and Matillion ETL have different architectures, and we recommend reading the following to gain an understanding of how the Data Productivity Cloud operates:

You can start using the Data Productivity Cloud in parallel with Matillion ETL, allowing you to test and evaluate features, and even to build and run new production pipelines, without migrating anything until you're ready. You also don't have to migrate every workload as a single operation—you can migrate a single workload, and get that into production in the Data Productivity Cloud before moving onto the next.


Plan your migration

Once you have determined that migration is right for you at this time, and you have read and understood all the implications outlined in this document, we recommend taking the following structured approach to the migration.

You don't have to migrate everything from Matillion ETL to the Data Productivity Cloud at one time. Start with one workload, which may be a single job or a set of co-dependent jobs in a single project, and ensure that that workload works before migrating the next, continuing to run Matillion ETL for workloads that haven't yet been migrated.


Data Productivity Cloud projects

The Data Productivity Cloud uses projects to logically group and separate workloads. You will be familiar with the concept of projects from Matillion ETL, and you may wish to create a set of projects and project folders that mirror your Matillion ETL project structure.

Different projects can have different users, permissions, and credentials associated with them in the Data Productivity Cloud, so you may want to take some time to plan this. Read Projects to understand what your options are when creating a Data Productivity Cloud project, and consider the following choices.

Full SaaS or Hybrid SaaS?

Read Matillion Full SaaS vs Hybrid SaaS for an explanation of the differences. For a Hybrid SaaS project, you install our agent software within your own cloud infrastructure and data plane. Read Create an agent in your infrastructure for details of how to set up the agent.

A Full SaaS project is the easiest way for a new customer to get started with learning the Data Productivity Cloud and to get initial pipelines running, but a Hybrid SaaS project is recommended if:

  • You are migrating workloads that need the functionality of a Hybrid SaaS configuration. For example:
    • Python scripting operates differently under Full SaaS versus Hybrid SaaS.
    • Hybrid SaaS gives you the flexibility to upload your own libraries and drivers.
  • Your data locality requirements need the agent to run in a specific region that Matillion doesn't currently provide agents in.
  • Your use case requires proximity between the data processing agent and your source systems.
  • You want the Data Productivity Cloud agent to run in the same network location as Matillion ETL.

This decision is made on a per-project basis, and you can run both types of project simultaneously if required.

Matillion-hosted Git or bring your own Git?

The Data Productivity Cloud uses Git version control to keep track of changes made to projects and facilitate collaboration. Read Git in Designer to learn more.

For each project, you must decide whether you will use the Data Productivity Cloud's own hosted Git repository, or whether you will connect to an external Git repository you control.

  • Connect your own Git repository: You connect a supported external Git repository to the Data Productivity Cloud. This is the recommended path, as it gives you much greater control and access to functionality within Git. This assumes you are already familiar with the use of Git repositories, and manage your own repository that you can connect.
  • Matillion-hosted Git: Matillion sets up and manages a Git repository on your behalf. If you don't already have your own Git repository, you can use this option for trials. This is the easiest option to configure and manage, but limits your Git options: you don't have direct access to the repository from other tools, you can't use a fully automated CI/CD process, or 4-eye principle, etc.

This choice is made when the project is created. If you decide to use your own Git repository, you must have your repository set up in advance using one of the supported third-party Git providers, and have the appropriate connection details and credentials available when you create the Data Productivity Cloud project.

The choice is made on a per-project basis, so you could create a project with Matillion-hosted Git for initial testing, then create a new project with your own repository for production workloads. For long-term production use, we recommend that you use your own Git provider, as the capabilities it can support will be much greater than those provided by Matillion-hosted Git.

Environments

A Data Productivity Cloud environment defines the connection between a project and your chosen cloud data platform. Environments include default configuration, such as a default warehouse, database, and schema, that can be used to pre-populate component properties in your pipelines. Ensure that you have your environment connection details and credentials available when you create your first project. Read Environments for details.

Credentials

Secrets, passwords, or OAuths that your pipelines will use to connect to third-party services must be created directly in the Data Productivity Cloud; for security reasons, we don't migrate these details. Ensure that you are aware which credentials your workloads need, and have the details available to create those credentials.

Read Secret definitions, Cloud provider credentials, and OAuth for details.

Branches

The Data Productivity Cloud is designed around the concept of Git branches to provide version control and collaboration features. This is similar to the optional Git integration feature in Matillion ETL, but in the Data Productivity Cloud it's not optional and all pipelines must be assigned to Git branches.

Regardless of whether you currently use Git in Matillion ETL or not, ensure you have read and understood Git in Designer.

Plan how you will use branches to contain your migrated pipelines. Decide whether you want a simple development/production branch structure, whether you want separate branches for different development teams, and so on. Your project will have a main branch created by default, but good practice is to perform all development work in dedicated development branches, and only merge the work into main when ready for production.