Pipelines
Pipelines are the Data Productivity Cloud's way of designing, organizing, and executing workflows. You use Designer to build strings of configured components inside a pipeline and then run that pipeline to carry out a desired task such as loading or transforming data.
Pipelines are created, configured, and managed through the main Designer user interface.
There are two types of pipeline in Data Productivity Cloud: orchestration and transformation.
- Orchestration pipelines deal with the loading of data from source system to target data warehouse. Typical orchestration components are connectors, flow logic components, and scripting components.
- Transformation pipelines deal with transforming table data that exists in your target data warehouse, typically after loading that data with an orchestration pipeline. Transformation components are often analogs of SQL operations such as creating and deleting tables, joining data, or performing calculations.
Components are the basic building blocks of pipelines. Each component is specifically applicable to one type of pipeline, orchestration or transformation, and can't be added to the other type of pipeline. To learn more about components, read Components overview.
Any pipeline can be saved as a shared pipeline, which can then be referenced by any other pipeline in any project in your Data Productivity Cloud. This feature helps maintain consistency of core functionality over multiple different projects across the organization.
Adding a pipeline
To create a new pipeline:
- In the Designer user interface, click the Pipelines tab.
- Click Add at the top of the Pipelines tab, and select either Orchestration pipeline or Transformation pipeline.
- Enter a name for the pipeline. The name may include alphanumerics, underscores, single spaces, parenthesis and hyphens. The name must be unique within each pipeline folder.
- Click Add. The new pipeline immediately opens in a new tab on the canvas.
This will place the pipeline at the top level of your pipeline folder tree. To create a pipeline within a folder, read Pipeline folders, below.
The newly created pipeline is blank at this stage. You now need to add components to the pipeline canvas to construct a workflow that will perform your data extraction and transformation tasks.
When you start using Designer, you can also use the Orchestration pipeline or Transformation pipeline tiles to create your first pipelines, or click the Watch video links to see a short video tutorial on pipeline creation.
Managing pipelines
In the Designer user interface, click the Pipelines tab to view your pipelines. The Pipelines tab lists every pipeline you have created on your current branch. Icons identify the type of pipeline: a blue O
for orchestration, or green T
for transformation. The pipelines can be organized into a hierarchy of folders, if required.
To view a pipeline's details on the canvas, do one of the following:
- Double-click the pipeline in the list.
- Click the pipeline to select it, then right-click and select Open pipeline.
- Hover over the pipeline, click the ... button that appears next to it, and click Open pipeline.
- Single-click the pipeline to select it and press
Enter
.
To delete a pipeline, do one of the following:
- Click the pipeline to select it, then right-click and select Delete pipeline.
- Hover over the pipeline, click the ... button that appears next to it, and click Delete pipeline.
Then click Delete to confirm the deletion.
Deleted pipelines can't be recovered.
Note
Deleting a pipeline may break other pipelines that reference it via Run Orchestration or Run Transformation components. You will therefore need to update any pipelines that reference this deleted pipeline.
To rename a pipeline, do one of the following:
- Click the pipeline to select it, then right-click and select Rename pipeline.
- Hover over the pipeline, click the ... button that appears next to it, and click Rename pipeline.
Note
Renaming a pipeline may break other pipelines that reference it. You will need to update any Run Orchestration or Run Transformation components that reference this pipeline with the new name.
Any existing schedules for a renamed pipeline will not be updated. You will need to create a new schedule using the new name.
Cancelling a running pipeline
You can cancel a pipeline while it's running in one of two ways. A soft cancellation will allow the current task to complete and then no other pipeline tasks will run. A forced cancellation will interrupt and terminate the current task, set the pipeline status to "Stopped", and will prevent the pipeline from receiving further information from the agent.
Click the Task history tab, and then click the "X" icon on the far-right of the task you wish to cancel. A single click will perform a soft cancellation. Clicking twice will perform a forced cancellation. A pop-up dialog will appear asking you to confirm the force cancel. Click, Yes, force cancel to terminate the pipeline, without waiting for any in-progress steps to finish.
Pipeline folders
A branch may contain a large number of pipelines, which by default are listed alphabetically in the Pipelines tab. To organize pipelines in a branch, you can arrange them into named folders. Folders can be nested inside folders, up to 10 levels deep, to create a structure that makes sense to you and your team. Pipeline names must be unique within a folder but can be duplicated in different folders.
To create a top-level folder, click Add at the top of the Pipelines tab and click New folder. Enter a name for the folder and click Create. Folder names can contain alphanumeric characters, dashes, and underscores.
You can collapse or expand a folder by clicking the > icon.
Each folder has its own context menu where you can add a pipeline to the folder, add another folder as a sub-folder, or delete the folder. Click ... next to the folder name to open the context menu.
When you delete a folder, all the pipelines and sub-folders it contains will also be deleted. However, active schedules for pipelines in a deleted folder will continue to use the latest published version of the pipelines. Deleting a folder can't be undone.
You can move a pipeline between folders, or between root and folder, by drag-and-drop. You can't move a pipeline to a folder where another pipeline with the same name already exists, and attempting to do so will result in a warning notice.
Warning
Be aware of the following when working with pipeline folders:
- Moving a pipeline into a different folder will break other pipelines that reference it via Run Orchestration or Run Transformation components. You will therefore need to update any pipelines that reference the moved pipeline.
- Any existing schedules that use the moved pipeline will not be updated. The pipelines will have to be republished and the schedule recreated.
- Empty folders cannot be committed to Git. To commit a folder, you need to add a pipeline in that folder. Otherwise, the empty folder will disappear from your pipeline list if you try to commit.
- Moving a pipeline to a different folder is considered a change to your branch, and therefore needs to be committed to Git.
Import and export pipelines
Pipelines can be shared between projects using the export and import functions. Export copies the pipeline definition to a zipped data file in your local filesystem. You can then use the Import function to import that data into another project. You can also export/import an entire pipeline folder, including all the pipelines and sub-folders it contains, or export/import the entire project including all folders and pipelines within it.
There is a limit on the size of the pipeline you want to import:
- Maximum compressed size: Fails if import exceeds 2MB.
- Maximum decompressed size: Fails once the import exceeds 25MB.
- Maximum entry size: Import will skip if it exceeds 1MB.
- Maximum entries: Import will fail once it exceeds 500.
Each export operation creates a single zipped .yaml
format data file containing all the export pipeline definitions. If the export is at a folder or project level, the zip file preserves the pipeline folder structure within it.
The export includes everything required to recreate the exported pipelines in a new project, including all component property values, pipeline variables, component connections, and canvas notes. It does not include project-level configurations such as schedules, secrets, OAuths, and project variables, so you must ensure the target project has these configured as required before running the imported pipeline.
Export a pipeline
In the Pipelines tab, hover over a pipeline, a folder, or the project name at the top of the folder tree, click the ... button that appears next to it, and click Export.
The file will be exported to the default download location of your local filesystem, with a name of the form <project name>_<branch>_<file/folder name>_<yyyymmdd>T<hhmmss>.zip
, creating a unique name for each export.
Import a pipeline
Note
A pipeline can only be imported if the importing user has the same user permissions as the user who exported it.
- In the Pipelines tab, hover over the folder that you want to import into, click the ... button that appears next to it, and click Import.
- Browse your filesystem to locate the exported zip file that you want to import, select the file, and click Open.
The pipelines contained in the zip file will be imported to the selected pipeline folder. If you are importing a pipeline folder or an entire project, it will form a sub-folder of the folder you are importing into.