Skip to content

Job concurrency

Overview

This page explains Matillion ETL job queues and concurrently run jobs. An overview of Matillion ETL's process of running jobs is shown below:

Matillion ETL job concurrency


Single instance

When a Matillion ETL job is submitted in Matillion ETL via Message Queues, the API, or the Scheduler, it is queued via the Quartz job scheduler. This is an unbound queue, meaning that there is no imposed limit to the size that this queue can grow to.

The task runner will then begin executing jobs. By default, all Matillion ETL instances allow up to 16 jobs concurrently. The vCPUs of a Matillion ETL instance size determines the maximum number of processes that can run inside a single Matillion ETL job. A single job execution can run up to n*2 processes concurrently (where n is the number of vCPUs on the instance). Multiple runs of the same job queue behind each other—for example, running one job 16 times runs those jobs sequentially, whereas running 16 different jobs runs all of those jobs concurrently.

Although Matillion ETL is suggested for use with specific instances, users can launch Matillion ETL with other instance types with potentially many more vCPUs. There is no hard upper limit on concurrency in this respect.

This can become a problem when there are multiple instances of a job being submitted, especially if the jobs are long-running, because they can conflict with a scheduled job and prevent it from running on its scheduled time. In these scenarios, it's recommended to use a micro-batching pattern.

:::info{title='Note'} Jobs run through the Matillion ETL user interface, and validation tasks, are submitted directly to the task runner, not via the Quartz job scheduler. This means users can bypass any queued jobs and run immediately if the concurrency limit is not reached. :::

In Matillion ETL, the task runner handles validation tasks, and queues these tasks behind any currently running jobs. With this in mind, users should note that once the maximum concurrency limit is reached, any development performed on the same instance may experience long validation times. If this does occur, users may find benefit in separating their production and non-production workloads across unique instances.

Matillion ETL tasks

When a job is running, it's given a pool of threads to run its tasks. The number of threads in each pool defaults to 2*n Vcores. For more information, read our instance sizes guide.


HA Cluster

:::info{title='Note'} Clustered instances are not relevant to Matillion ETL for BigQuery. :::

A cluster is considered two or more single instances. Each instance has its own task runner, and will pick up jobs from a shared Quartz queue. For example, a two-node cluster with 8 vCPUs on each instance will be able to run 2x(2x8) = 32 jobs concurrently as the rule of 2*vCPUs still applies.

The distribution of tasks across nodes can be considered essentially random. However, a node that already has a full queue of 16 tasks won't pick up further tasks until there is at least one free task runner.

When a Matillion ETL job runs on an HA cluster and a node fails (for example, due to a network failure or an instance crash) then within a few seconds the failure will be detected by another node and the job re-submitted from the start. It's important to make your Jobs idempotent, so that they can handle these scenarios.

Jobs executed through the Matillion ETL user interface are executed by whichever node picks up the command.

Jobs executed via the scheduler, message queue, and API are submitted to the Quartz scheduler, which distributes the tasks across the cluster randomly.

The queuing of jobs with the same name applies per-node. For example, if two runs of the same job end up on the same node, the second will queue behind the first; if two runs of the same job end up being run on different nodes, they will run concurrently. This may cause deadlocks, and should generally be avoided by adhering to the advice given in Designing a Job for a High Availability Cluster.

:::info{title='Warning'}

The same transformation job CANNOT be successfully executed more than once at a given moment and thus cannot run in parallel without error, even though this can be attempted. We strongly advise any user that wants to run a multiple instances of the same transformation job at once to use a [Shared job](/metl/docs/shared-jobs#parallel-transformation-jobs) instead. For more information, see [Designing a Job for a High Availability Cluster](/metl/docs/2881277#transformation-jobs).

:::


Sub jobs and shared jobs

When using a Run Orchestration component or shared job, these sub jobs do not queue, since the queuing is done at the top level. The important thing to note here is that a Run Orchestration component can reference the same job multiple times from different parent jobs; however, they will all reference the job using the same job ID. Because of this, only one instance can run at a time. Others will queue behind the currently running instance in the task runner. When running a shared job, this behaviour is different, and each instance of a shared job has a unique ID, meaning multiple instances of the same shared job can run concurrently.


Multiple Connections

Matillion's Multiple Environment Connections feature lets users run multiple ETL jobs across multiple connections. For more information, read Multiple Environment Connections documentation.