Shared pipelines
A shared pipeline is a way of saving an entire pipeline workflow and then re-using that same workflow in any project in your Data Productivity Cloud.
This feature helps maintain consistency over multiple different projects. For example, a designated "library" project could be used for functionality that must be used in a consistent way across the organization. This functionality would be created as version-controlled shared pipelines, and all other users instructed to use these pipelines in their own projects, using the Run Shared Pipeline component.
Shared pipelines have similarities to the Run Orchestration and Run Transformation components, but allow pipelines to be shared between projects, and give you more options for pipeline versioning.
If you use multiple cloud data warehouses across different projects, your shared pipelines can be used in projects that connect to any warehouse. However, you need to take care when using components that aren't available, or aren't identical in operation, across all warehouses.
Any pipeline can become a shared pipeline, however there are some best-practice considerations, described below, that you should be aware of when creating a pipeline that you intend to be shared. For full details of creating pipelines, read Pipelines.
Note
We call a pipeline that makes use of a shared pipeline the consumer or consuming pipeline.
Sharing a pipeline
- Create your pipeline.
- In the Pipelines tab, click ... next to the pipeline you want to share and then click Share.
- Commit the pipeline's branch, as described in Git commit.
- Push and Publish the branch, as described in Git push local changes. Publishing is required to make it available to other pipelines.
Note
When a pipeline is shared, Designer either creates or updates the configuration in a file named shared-pipelines.yaml
. You must not rename or move this file, and we recommend that you do not attempt to manually edit its contents. See Editing the configuration file, below, for further information.
Using a shared pipeline
Use the Run Shared Pipeline component to add the shared pipeline to any orchestration pipeline.
The default values for any pipeline variables set in the original pipeline will be used in the consuming pipeline, unless new values are set in the Run Shared Pipeline component.
Stop sharing a pipeline
To stop sharing a pipeline, click ... next to the pipeline you want to stop sharing and then click Unshare.
This will make the pipeline disappear from the drop-down in the Run Shared Pipeline component, so it can't be used by any new consumers. However, pipelines already consuming it will continue to run the latest available version after sharing is stopped. A Run Shared Pipeline component will show as "invalid" to warn that it's referencing an unshared pipeline, but it will continue to run correctly using the selected version.
Best practices for shared pipelines
When creating a shared pipeline, be aware of the following best practices.
General principles
- Avoid using the same name for both a transformation and an orchestration pipeline in the same folder, as this will cause confusion over which is referenced by the Run Shared Pipeline component.
- Avoid changing the name of a project containing shared pipelines. If you change the project name, any Run Shared Pipeline components currently configured to use a pipeline from that project will continue to run, but won't be able to access subsequent changes you may make to the shared pipeline, unless the consumer re-points the Run Shared Pipeline component to the new project name.
- If you use multiple cloud data warehouses, your shared pipelines can be used in projects that connect to any warehouse, but you need to take care to avoid component incompatibilities between warehouses. One suggestion is to include the warehouse type as part of the pipeline name, to alert the consumer that it might not be compatible with the warehouse they are using.
Consuming pipelines
- Avoid self-referencing a shared pipeline in the same project it came from. If both are shared, there will be no way to distinguish them in the Run Shared Pipeline component.
- Take care if referencing a shared pipeline in a pipeline that will itself be shared. This can create issues with circular dependencies and make troubleshooting difficult.
Variables
- Avoid hard coding
[Environment Default]
into the component parameters of a shared pipeline. It's better to use pipeline variables to define parameters wherever possible, and the value[Environment Default]
can then be set in the consuming pipeline, as this makes all declarations explicit for the consumer. - Avoid using project variables within a shared pipeline, as the pipeline won't work if consumed in a project without those variables defined. It's better to use pipeline variables only, as these can then be given suitable new values by the consumer.
- Consider deleting variable defaults for any variables that are required inputs by the consumer. This will avoid any uncertainty over whether the consumer should use that default value or set their own value in Run Shared Pipeline. This only applies to variables with Public visibility.
- If you want to set a variable value that can't be overridden by the consumer, set the variable visibility to Private.
- Consider using error trapping in the shared pipeline, to verify that the consumer isn't giving invalid values to a variable used by the shared pipeline. You could do this by checking the variable values in an If component and passing the pipeline flow to an End Failure component if an invalid value is found.
Versioning and release workflow
Shared pipelines are versioned as part of the project, to ensure that any dependencies are versioned together.
When using shared pipelines, you should consider a version management strategy that works best for you. Some possible approaches are:
-
Consumers choose a particular version of the project to take the shared pipeline from. This avoids unexpected "breaking" changes to the shared pipeline being pushed to the consumer, as the owner of the consuming pipeline can choose when and how often to update to a newer project version. The disadvantage is that the consumers won't automatically inherit necessary updates made to the shared pipeline.
If you parameterize the Version property of your Run Shared Pipeline components, using Variables, you can easily upgrade multiple consuming pipelines in a single operation by updating the project variable.
-
Create a new shared pipeline for each new "breaking" change to the pipeline. The existing pipeline is only updated for essential, non-breaking changes. The consumer could then select "Latest" as the shared pipeline version, so they automatically receive all essential changes from the latest version but have confidence that no breaking changes will be forced on them. Some disadvantages of this approach are:
- "Legacy" shared pipeline need to be maintained in the project for as long as they are being consumed, and a deprecation strategy may be needed to deal with an ever-expaning number of versioned pipelines.
- Consumers need to trust the shared pipeline producers not to apply breaking changes to a shared pipeline they are consuming.
If using this strategy, you can parameterize the name of the pipeline being called by your Run Shared Pipeline components, using a project variable, to allow easy upgrading to a named new version when an old version is deprecated. For example,
project-name#pipeline-name-${pipeline_version}
.
Note
In the context of shared pipelines, "breaking" changes are any changes that could cause the consuming pipeline to fail. For example, adding or removing pipeline variables that the consumer is trying to map values to.
Editing the configuration file
Shared pipeline configuration is stored in a file called shared-pipelines.yaml
, found in the .matillion
folder of your Git repository.
Warning
You must never move or rename this file.
Note
This is only applicable if you are using your own Git provider. On Matillion-hosted Git, this file is not accessible.
The file is automatically updated when changes are made to shared pipelines, so you should never normally need to manually edit the file, and we recommend that you don't attempt to. However, there may be occasions when editing is required, for example to resolve a conflict arising during merging. In these cases, it's important to understand the structure of the file.
In shared-pipelines.yaml
, each pipeline is listed, along with the properties needed to define the pipeline. The structure is as follows:
version: "1.0"
type: shared-pipelines-config
pipelines:
# <pipeline name> is the full path of the pipeline, for example folder1/sub-folder1/Shared Pipeline.orch.yaml
- pipeline: <pipeline name>
# <display name> is not actively used
displayName: <display name>
# <pipeline-id> is the value set the first time the pipeline is shared and what is selected by the consuming user. It can only contain letters, numbers and the following characters: - _ . ! ~ * ' ( )
id: <pipeline-id>
# enabled reflects the current sharing status of the pipeline. If pipelines are Unshared, this value is set to 'false', which allows the ID to persist in the event it is shared in a future version of the project
enabled: true
# parameters list the public scalar and grid variables in the root pipeline. The below show two variables, var-1 and var-2
parameters:
var-1:
type: TEXT
description: ""
scope:"COPIED"
var-2:
type: TEXT
description: ""
scope:"COPIED"
If you need to edit this file for any reason, take care to avoid the following errors, which would cause the pipeline to fail:
- Duplicate, missing or mis-spelled object properties.
- Mandatory properties having a blank value.
- Two or more pipelines with the same
<pipeline-id>
. - Invalid pipeline reference.
<pipeline-id>
containing invalid characters.