Jobs
Overview
Jobs are Matillion ETL's main way of designing, organizing, and executing workflows. The most common usage of Matillion ETL is to build strings of configured components inside a job and then run that job to accomplish a desired task such as loading or transforming data. The area of the UI where components are laid out within a job is called the canvas.
Although most components are common to all platforms, some are not, and are only available for certain versions of Matillion ETL. For example, Matillion ETL for Redshift may have some Redshift-specific components that Matillion ETL for BigQuery does not, and vice-versa.
There are two main flavours of jobs in Matillion ETL: orchestration and transformation.
- Orchestration jobs are primarily concerned with DDL statements (especially creating, dropping, and altering resources), loading data from external sources.
- Transformation jobs are used for transforming data that already exists within tables. This includes filtering data, changing data types, and removing rows.
Shared Jobs are simply packaged jobs created by users that can be orchestration and/or transformation and are used similarly to how a component is used.
Orchestration jobs
Orchestration Jobs deal with the management of resources (such as tables) as well as loading data from external sources. This typically makes orchestration jobs a user's first real use of Matillion ETL as data must be loaded into a table before being transformed. An orchestration job is chiefly defined by the components it contains.
Data can be loaded using connectors (analogously: data stagers, integrations, query components). Read Data staging components for more information.
Transformation jobs
Transformation jobs are, predictably, concerned with transforming data within tables. This generally comes in the form of components that are named after the functions and DML commands that they represent such as Rank and Aggregate. A transformation job is chiefly defined by the components it contains.
Transformation jobs have no specific 'Start' point unlike orchestration jobs, and many flows can be specified to run at once by creating multiple strings of components. Users may want to consider Job Concurrency when making such jobs.
Shared jobs
Shared jobs are packaged jobs, created by users, that can be used similarly to how a component is used. This can be conceptually understood as a user-created component, the complexity of which could be as simple as a single Python Script component or as complex as an entire ETL workflow.
Smart use of shared jobs allows users to decrease the complexity of their workspace where repeatable, complex workflows are packaged into a single shared job that is then used in other workflows. In many ways this is a neater, more configurable, more portable version of linking to other jobs using the Run Orchestration and Run Transformation components.
For more information, see Shared Jobs.