Upgrade considerations
There are two different paths to upgrading to the Data Productivity Cloud from Matillion ETL:
- Self Serve, which is detailed in these pages.
- Assisted, where you are supported by Matillion or a Matillion partner. Contact your Matillion Account Manager to discuss support for your upgrade project.
There are some prerequisite steps you need to take to set up the Data Productivity Cloud before upgrading, and some decisions you need to make about how you will use the Data Productivity Cloud, which are discussed below.
What do you need to consider before upgrading?
- The Data Productivity Cloud supports Snowflake, Databricks, and Amazon Redshift.
- Snowflake on GCP is only available in the Data Productivity Cloud in a Full SaaS environment at this time.
- Google Big Query isn't supported in the Data Productivity Cloud at this time.
- Azure Synapse isn't supported in the Data Productivity Cloud at this time.
As a Matillion ETL user, you'll find that the Data Productivity Cloud has a very familiar look and feel, but there are some small terminology differences you'll need to bear in mind when reading the documentation:
Matillion ETL | Data Productivity Cloud |
---|---|
Job | Pipeline |
Job variable | Pipeline variable |
Environment variable | Project variable |
The Data Productivity Cloud and Matillion ETL have different architectures, and we recommend reading the following to gain an understanding of how the Data Productivity Cloud operates:
- Data Productivity Cloud architecture.
- The SaaS delivery model.
- The billing model.
- Data Productivity Cloud security and the security information in the Matillion Trust Center.
You can start using the Data Productivity Cloud in parallel with Matillion ETL, allowing you to test and evaluate features, and even to build and run new production pipelines, without migrating any workloads until you're ready. You also don't have to migrate every workload as a single operation—you can migrate a single workload, and get that into production in the Data Productivity Cloud before moving onto the next.
Data Productivity Cloud projects
The Data Productivity Cloud uses projects to logically group and separate workloads. You will be familiar with the concept of projects from Matillion ETL, and you may wish to create a set of projects and project folders that mirror your Matillion ETL project structure.
Different projects can have different users, permissions, and credentials associated with them in the Data Productivity Cloud, so you may want to take some time to plan this. Read Projects to understand what your options are when creating a Data Productivity Cloud project, and consider the following choices.
Full SaaS or Hybrid SaaS?
Read Matillion Full SaaS vs Hybrid SaaS for an explanation of the differences. For a Hybrid SaaS project, you install our agent software within your own cloud infrastructure and data plane. Read Create an agent in your infrastructure for details of how to set up the agent.
A Full SaaS project is the easiest way for a new customer to get started with learning the Data Productivity Cloud and to get initial pipelines running, but a Hybrid SaaS project is recommended if:
- You are migrating workloads that need the functionality of a Hybrid SaaS configuration. For example:
- Python scripting operates differently under Full SaaS versus Hybrid SaaS.
- Hybrid SaaS gives you the flexibility to upload your own libraries and drivers.
- Your data locality requirements need the agent to run in a specific region that Matillion doesn't currently provide agents in.
- Your use case requires proximity between the data processing agent and your source systems.
- You want the Data Productivity Cloud agent to run in the same network location as Matillion ETL.
This decision is made on a per-project basis, and you can run both types of project simultaneously if required.
Bring your own Git
The Data Productivity Cloud uses Git version control to keep track of changes made to projects and facilitate collaboration. Read Git in Designer to learn more.
We recommend you connect your own Git repository to the Data Productivity Cloud. This is the recommended path, as it gives you much greater control and access to functionality within Git. This assumes you are already familiar with the use of Git repositories, and manage your own repository that you can connect.
You must have your repository set up in advance using one of the supported third-party Git providers, and have the appropriate connection details and credentials available when you create the Data Productivity Cloud project.
Environments
A Data Productivity Cloud environment defines the connection between a project and your chosen cloud data platform. Environments include default configuration, such as a default warehouse, database, and schema, that can be used to pre-populate component properties in your pipelines. Ensure that you have your environment connection details and credentials available when you create your first project. Read Environments for details.
Credentials
Secrets, passwords, or OAuths that your jobs use to connect to third-party services must be recreated directly in the Data Productivity Cloud; for security reasons, we don't migrate these details. Ensure that you are aware which credentials your workloads need, and have the details available to create those credentials.
Read Secrets and secret definitions, Cloud provider credentials, and OAuth for details.
Branches
The Data Productivity Cloud is designed around the concept of Git branches to provide version control and collaboration features. This is similar to the optional Git integration feature in Matillion ETL, but in the Data Productivity Cloud it's not optional and all pipelines must be assigned to Git branches.
Regardless of whether you currently use Git in Matillion ETL or not, ensure you have read and understood Git in Designer.
Plan how you will use branches to contain your migrated pipelines. Decide whether you want a simple development/production branch structure, whether you want separate branches for different development teams, and so on. Your project will have a main
branch created by default, but good practice is to perform all development work in dedicated development branches, and only merge the work into main when ready for production.