Skip to content

Upgrade: Shared jobs

There are some additional factors to consider when upgrading a Matillion ETL shared job to a Data Productivity Cloud shared pipeline. To correctly upgrade shared jobs, use the process given below.

Before doing this, ensure that you fully understand the concepts and use of both shared jobs in Matillion ETL and shared pipelines in the Data Productivity Cloud.

Best practice for shared pipelines is to create them in their own dedicated project that is separate from the projects that consume them. These instructions assume you will be doing that.


Video example

Expand this box to watch our video about migrating Matillion ETL shared jobs to shared pipelines in the Data Productivity Cloud.

Video


Upgrade path

  1. In Matillion ETL, unpack the shared jobs you want to export.

    Note

    If you have the original source of the shared jobs, you can skip this step and export the source instead.

  2. Export the unpacked jobs, as described in Export from Matillion ETL.

  3. In the Data Productivity Cloud, import the shared jobs, as described in the Import to the Data Productivity Cloud. Ensure that you are importing into the project you are using to hold your shared pipelines.
  4. Refactor, test, and amend these pipelines as needed to ensure they perform the expected function in the Data Productivity Cloud.
  5. Share the pipelines, as described in Sharing a pipeline.
  6. In Matillion ETL, export the jobs that use the shared jobs.
  7. In the Data Productivity Cloud, import the exported jobs. These will become your consuming pipelines in the Data Productivity Cloud.
  8. Create a mapping to resolve any issues in the import that require refactoring. Read Shared job mappings to learn how to do this.
  9. Refactor and test the imported pipeline to ensure it functions as expected and correctly calls the shared pipelines it needs.

Shared job mappings

The mapping feature gives you a tool to resolve issues in migrating shared jobs from Matillion ETL to Data Productivity Cloud shared pipelines.

When following the import process, shared job components in Matillion ETL won't have direct equivalents in the Data Productivity Cloud, and will result in a Manual refactor status for the imported pipeline. A pipeline with this status won't validate or run until you edit the pipeline to replace the "unknown" components with suitably configured Data Productivity Cloud equivalents.

Mapping provides a mechanism for you to tell the Data Productivity Cloud how to perform these replacements automatically, across all your imported pipelines. You may still choose to manually edit the pipeline, but mapping gives you an alternative process that avoids the need to click through the configuration of all the pipelines and component properties via the canvas UI.

Mapping process

After importing, if the Importing files panel shows pipelines that require Manual refactor, you can follow this process:

  1. Click Add mapping in the Manual refactor section of the panel.

  2. Enter the mapping information in the Add mappings for imported jobs dialog, in JSON format as described in Structure of the mapping, below. The dialog will autocomplete necessary syntax (such as }, ]) to ensure the JSON is well-formed.

    Note

    There is no mechanism for saving the mapping, but if it's a mapping you intend to reuse (for example, you may use the same shared job in many different jobs which you intend to migrate to the Data Productivity Cloud at different times, and all of which will require the same, or very similar, mapping) then we recommend you copy the text and save it in an external text file.

  3. Click Apply & re-run.

  4. The Importing files panel should now show a pipeline status of Converted without changes, meaning the import will complete with no errors. You can now click Import to complete the import process.

Structure of the mapping

Mappings are defined using JSON syntax, and must follow a strict structure. The structure is best shown with an example:

{
    "orchestrationComponents": [
        {
            "id": "unknown-to-run-shared-pipeline",
            "metlImplementationId": 822451381,
            "pipelineName": "sp#python-printer",
            "version": "[Latest]",
            "parameters": [
                {
                    "metlName": "scalar_one",
                    "variable": "scalar_one",
                    "variableType": "SCALAR"
                },
                ...
            ]
        }
    ]
}

You only need to provide one mapping per unique metlImplementationId for all the pipelines being imported.

The properties you need to set in this structure are:

  • id: The identifier string that tells the import function what mapping logic to run. This must be set to unknown-to-run-shared-pipeline.
  • metlImplementationId: The unique identifier that is given to the shared pipeline component in Matillion ETL. To find this, open the .json file created when the job was exported from Matillion ETL, and search for the numeric implementationID that corresponds to the unknown component you are mapping.
  • pipelineName: The name of the Data Productivity Cloud shared pipeline that this unknown component should map to.
  • version: The version of the Data Productivity Cloud shared pipeline. You can set this to [Latest], or to any specific version you want to map to. Read Shared Pipelines for an explanation of versioning.
  • parameters: Define the following for each of the shared pipeline's variables.
    • metlName: The name assigned to the variable in the Matillion ETL job.
    • variable: The name of the variable in the Data Productivity Cloud pipeline.
    • variableType: The type of the variable in the Data Productivity Cloud pipeline. Can be SCALAR or GRID.

Video