Scaling best practices🔗
Maia executes pipelines via a Maia runner. This works by decomposing your pipelines into tasks, which are then distributed across the instances (nodes) of a Maia runner.
When hosting Maia runners within your VPC or VNet (also known as a Hybrid SaaS solution) it's necessary to right-size your Maia runners to ensure you can obtain the level of performance and concurrency you need.
This guide provides details of the key considerations.
Note
If you're using our Full SaaS offering, we advise you contact us to discuss scaling your Maia runner.
Tasks🔗
Tasks are the smallest unit of work a Maia runner can execute and consist of items such as:
- A single orchestration component execution.
- A specific execution of a transformation pipeline.
Note
Using Designer also generates tasks, for actions such as running a sample operation or loading a list of tables or columns. However, no limiting of these design-time tasks is undertaken.
Maia runner instances🔗
In Maia, work is executed via Maia runners, and each Maia runner is made up of Maia runner instances. In practice, this is implemented using containers. A Maia runner is a named collection of containers, with each container being known as a Maia runner instance.
When pipeline tasks are sent to a Maia runner, they will be sent to any Maia runner instance that has capacity. If there is no current capacity, then the pipeline task will be queued. When an instance subsequently has capacity, the pipeline task will be sent to that instance and be executed.
Note
Maia runners should be configured with a minimum of 2 Maia runner instances to ensure the automatic upgrade process does not cause a service outage—since Maia runner instances will be upgraded in a staggered fashion.
Maia runner instance capacity🔗
To protect the stability of the Maia runner instances under load, a Maia runner instance won't take on a new task if:
- The CPU usage exceeds 80%.
- RAM usage reaches the default maximum heap size (60% of system RAM).
- The Maia runner instance is already running 20 concurrent tasks.
Tasks that can't execute because there is no available Maia runner instance will queue until a Maia runner instance becomes available.
Note
If you consistently see tasks queuing or Maia runner instances frequently reaching these thresholds, consider scaling your deployment by adding more Maia runner instances.
Scaling for load🔗
Horizontal scaling🔗
Each Maia runner instance is limited to 20 concurrent tasks at any one time. This is regardless of the amount of resources assigned to the Maia runner instance. As such, a high level of concurrency in your pipelines would result in tasks being queued and would result in the overall pipeline execution taking longer.
Horizontal scaling involves adding more Maia runner instances. By adding more instances, you increase the number of tasks that can run in parallel—two Maia runner instances allow 40 concurrent tasks to be executed, and so on—reducing task queuing.
The method for adding Maia runner instances will vary depending on your container orchestrator—see here for detailed instructions:
Cost implications🔗
Adding more Maia runner instances does not result in extra charges from Matillion. Our credit charges are based on task execution time. Task queuing time does not consume credits.
However, running extra containers (e.g. Maia runner instances) is likely to increase the infrastructure cost from your container orchestrator (e.g. AWS Fargate).
As such, ensuring enough Maia runner instances are available for the required performance requires balance between desired performance and infrastructure cost/budget.
Transformation tasks - low load🔗
Since transformation tasks generate SQL that is then executed by your cloud data warehouse, these tasks do not require a large amount of CPU time or memory on the Maia runner instances. With this in mind, if your workload is "transformation heavy", a smaller Maia runner (with a low number of Maia runner instances) will likely suffice.
Data ingestion and scripting - high load🔗
Components that move or ingest data—as well as those allowing the execution of customer scripts such as Python or Bash—place a high CPU and memory burden on Maia runner instances. If workloads involve a high volume data ingestion or custom scripting, you'll need to run a larger number of Maia runner instances.
Further considerations🔗
Scale up delay🔗
Once you have edited the Maia runner service to start more Maia runner instances, there is a delay of approximately 4 minutes for the Maia runner instances to start and dial back to Maia to begin accepting tasks.
Scaling AWS Maia runners from within a pipeline🔗
If using AWS ECS Fargate to run your Maia runners, you can scale up and down within a pipeline. The AWS command line tools are available within the Bash Script component and this can be used to edit the ECS service to change the desired number of instances.
Bash scripts executed in Hybrid SaaS environments obtain the IAM permissions assigned to the Maia runner. If these permissions include amending an ECS Fargate service, then a script can be used to change the number of Maia runner instances.
The below script can be used within a Bash Script orchestration component to do this:
###
# This script will alter the desired task count for a Matillion Agent
# Please set the required variables to the values seen in the AWS ECS Console
# Note: new agent instances usually take around 4 minutes to be available for task processing
###
​
AWS_REGION=<AWS region e.g. eu-west-1>
AWS_ECS_SERVICE=<ECS Fargate Task Name>
AWS_ECS_CLUSTER=<ECS Cluster Name>
DESIRED_AGENT_COUNT=2
​
aws ecs update-service --service $AWS_ECS_SERVICE --desired-count $DESIRED_AGENT_COUNT --region $AWS_REGION --cluster $AWS_ECS_CLUSTER
This script needs the following permission in the IAM role attached to the Task Definition as the Task Role for the Maia runner:
ecs:UpdateService
You must add this permission before running the script, as it's not added by default.
Note
- New Maia runner instances take around 4 minutes to become available. Please consider this when scheduling your scaling events.
- Be aware of other pipelines or users who may be relying on the Maia runner—this resize will affect any users or pipelines using the Maia runner.
Scaling Snowflake🔗
If you're not seeing the performance expected—even when using a Matillion Full SaaS solution of sufficient size for your workloads—it might be that your Snowflake warehouse defined in the Matillion environment needs scaling. Read Monitoring Warehouse Load to learn more.
If a warehouse is overloaded with many parallel queries, the queries will queue. A large queue time shown in the Snowflake graph indicates pipeline performance will benefit from scaling the warehouse.
Matillion recommends enabling multi-cluster warehousing if this is available in your Snowflake account. Using this mechanism, Snowflake will horizontally scale the warehouse by starting and stopping instances of the warehouse automatically. Matillion has found that this improves concurrent performance in a better way than simply increasing the size of the warehouse.
Scaling Maia runner for Snowflake🔗
You can scale the number of Maia runner instances within Snowflake, which allows you to increase or decrease concurrency to better handle your pipeline workload.
Warning
Scaling down the number of Maia runner instances while pipelines are running and the Maia runner is not in a Paused state can lead to pipeline failures.
- From your Snowflake Home screen, click Data Products → Apps. You must be using the role that originally installed the application.
- Locate Matillion Maia in the list of apps, and click to select it. If you have multiple installs of the Native App, select the one you wish to scale.
- Click the Control Panel tab.
- Under Agent scaling enter the number of Maia runner instances you want to run. The maximum number of Maia runner instances is 10.
- Click Apply, then Apply again to confirm.
This will put the compute pool into a Resizing state and change the Maia runner status to Pending. The Maia runner status will change to Running after a few minutes.