Skip to content

Upgrade considerationsπŸ”—

There are two different paths to upgrading to the Data Productivity Cloud from Matillion ETL:

  • Self Serve, which is detailed in these pages.
  • Assisted, where you are supported by Matillion or a Matillion partner. Contact your Matillion Account Manager to discuss support for your upgrade project.

There are some prerequisite steps you need to take to set up the Data Productivity Cloud before upgrading, and some decisions you need to make about how you will use the Data Productivity Cloud, which are discussed below.


What do you need to consider before upgrading?πŸ”—

  • The Data Productivity Cloud supports Snowflake, Databricks, and Amazon Redshift.
  • Snowflake on GCP is only available in the Data Productivity Cloud in a Full SaaS environment at this time.
  • Google Big Query isn't supported in the Data Productivity Cloud at this time.
  • Azure Synapse isn't supported in the Data Productivity Cloud at this time.

As a Matillion ETL user, you'll find that the Data Productivity Cloud has a very familiar look and feel, but there are some small terminology differences you'll need to bear in mind when reading the documentation:

Matillion ETL Data Productivity Cloud
Job Pipeline
Job variable Pipeline variable
Environment variable Project variable

The Data Productivity Cloud and Matillion ETL have different architectures, and we recommend reading the following to gain an understanding of how the Data Productivity Cloud operates:

You can start using the Data Productivity Cloud in parallel with Matillion ETL, allowing you to test and evaluate features, and even to build and run new production pipelines, without migrating any workloads until you're ready. You also don't have to migrate every workload as a single operationβ€”you can migrate a single workload, and get that into production in the Data Productivity Cloud before moving onto the next.


Data Productivity Cloud projectsπŸ”—

The Data Productivity Cloud uses projects to logically group and separate workloads. You will be familiar with the concept of projects from Matillion ETL, and you may wish to create a set of projects and project folders that mirror your Matillion ETL project structure.

Different projects can have different users, permissions, and credentials associated with them in the Data Productivity Cloud, so you may want to take some time to plan this. Read Projects to understand what your options are when creating a Data Productivity Cloud project, and consider the following choices.

Full SaaS or Hybrid SaaS?πŸ”—

Read Matillion Full SaaS vs Hybrid SaaS for an explanation of the differences. For a Hybrid SaaS project, you install our agent software within your own cloud infrastructure and data plane. Read Create an agent in your infrastructure for details of how to set up the agent.

A Full SaaS project is the easiest way for a new customer to get started with learning the Data Productivity Cloud and to get initial pipelines running, but a Hybrid SaaS project is recommended if:

  • You are migrating workloads that need the functionality of a Hybrid SaaS configuration. For example:
    • Python scripting operates differently under Full SaaS versus Hybrid SaaS.
    • Hybrid SaaS gives you the flexibility to upload your own libraries and drivers.
  • Your data locality requirements need the agent to run in a specific region that Matillion doesn't currently provide agents in.
  • Your use case requires proximity between the data processing agent and your source systems.
  • You want the Data Productivity Cloud agent to run in the same network location as Matillion ETL.

This decision is made on a per-project basis, and you can run both types of project simultaneously if required.

Bring your own GitπŸ”—

The Data Productivity Cloud uses Git version control to keep track of changes made to projects and facilitate collaboration. Read Git in Designer to learn more.

We recommend you connect your own Git repository to the Data Productivity Cloud. This is the recommended path, as it gives you much greater control and access to functionality within Git. This assumes you are already familiar with the use of Git repositories, and manage your own repository that you can connect.

Warning

Reusing an existing Matillion ETL Git repository isn't supported and can cause issues with your pipelines.

You must have your repository set up in advance using one of the supported third-party Git providers, and have the appropriate connection details and credentials available when you create the Data Productivity Cloud project.

EnvironmentsπŸ”—

A Data Productivity Cloud environment defines the connection between a project and your chosen cloud data platform. Environments include default configuration, such as a default warehouse, database, and schema, that can be used to pre-populate component properties in your pipelines. Ensure that you have your environment connection details and credentials available when you create your first project. Read Environments for details.

CredentialsπŸ”—

Secrets, passwords, or OAuths that your jobs use to connect to third-party services must be recreated directly in the Data Productivity Cloud; for security reasons, we don't migrate these details. Ensure that you are aware which credentials your workloads need, and have the details available to create those credentials.

Read Secrets and secret definitions, Cloud provider credentials, and OAuth for details.

BranchesπŸ”—

The Data Productivity Cloud is designed around the concept of Git branches to provide version control and collaboration features. This is similar to the optional Git integration feature in Matillion ETL, but in the Data Productivity Cloud it's not optional and all pipelines must be assigned to Git branches.

Regardless of whether you currently use Git in Matillion ETL or not, ensure you have read and understood Git in Designer.

Plan how you will use branches to contain your migrated pipelines. Decide whether you want a simple development/production branch structure, whether you want separate branches for different development teams, and so on. Your project will have a main branch created by default, but good practice is to perform all development work in dedicated development branches, and only merge the work into main when ready for production.


Task concurrencyπŸ”—

Consideration needs to be given to managing concurrency post-upgrade, as the Data Productivity Cloud architecture can support more concurrent pipeline runs than Matillion ETL supports concurrent jobs. In most cases, this will result in performance improvement without any issues, but there are some edge-case scenarios, outlined below, that you should be aware of and take steps to mitigate if necessary.

In Matillion ETL, the number of concurrent tasks that can run is determined by instance size. In the Data Productivity Cloud, the number of concurrent tasks is determined by the number of agent instances you have running, and can therefore scale much higher than concurrency in Matillion ETL.

Data Productivity Cloud pipeline execution is tied to one agent, but can use all the agent instances within that agent. For example, with one agent scaled to eight agent instances, you can have up to 160 concurrent task executions for a single pipeline execution. With eight separate agents each scaled to a single instance, you can have a maximum pipeline concurrency of 20.

This greater concurrency may cause issues where:

  • The greater throughput from concurrent tasks in the Data Productivity Cloud may cause you to hit limits in your cloud data warehouse, where requests could queue and time out before being processed. To mitigate this risk, you can refactor pipelines to reduce the load on your cloud data warehouse at any one time, or you may be able to configure your cloud data warehouse to handle the request queue more efficiently.
  • Two processes in Matillion ETL may have always run sequentially because the limit on concurrent processes prevented them from running at the same time, even though there was no explicit link between them. In the Data Productivity Cloud, these processes may run concurrently or in a different order, which could cause issues if there are dependencies between them. You should review your pipelines to identify any cases where this could be an issue, and add links between processes where necessary to ensure they run in the correct order.