Scaling best practices🔗
Maia executes pipelines via a Maia agent. This works by decomposing your pipelines into tasks, which are then distributed across the instances (nodes) of a Maia agent.
When hosting Maia agents within your VPC or VNet (also known as a Hybrid SaaS solution) it's necessary to right-size your Maia agents to ensure you can obtain the level of performance and concurrency you need.
This guide provides details of the key considerations.
Note
If you're using our Full SaaS offering, we advise you contact us to discuss scaling your Maia agent.
Tasks🔗
Tasks are the smallest unit of work a Maia agent can execute and consist of items such as:
- A single orchestration component execution.
- A specific execution of a transformation pipeline.
Note
Using Designer also generates tasks, for actions such as running a sample operation or loading a list of tables or columns. However, no limiting of these design-time tasks is undertaken.
Maia agent instances🔗
In Maia, work is executed via Maia agents, and each Maia agent is made up of Maia agent instances. In practice, this is implemented using containers. A Maia agent is a named collection of containers, with each container being known as a Maia agent instance.
When pipeline tasks are sent to a Maia agent, they will be sent to any Maia agent instance that has capacity. If there is no current capacity, then the pipeline task will be queued. When an instance subsequently has capacity, the pipeline task will be sent to that instance and be executed.
Note
Maia agents should be configured with a minimum of 2 Maia agent instances to ensure the automatic upgrade process does not cause a service outage—since Maia agent instances will be upgraded in a staggered fashion.
Maia agent instance capacity🔗
To protect the stability of the Maia agent instances under load, a Maia agent instance won't take on a new task if:
- The CPU usage exceeds 80%.
- RAM usage reaches the default maximum heap size (60% of system RAM).
- The Maia agent instance is already running 20 concurrent tasks.
Tasks that can't execute because there is no available Maia agent instance will queue until a Maia agent instance becomes available.
Note
If you consistently see tasks queuing or Maia agent instances frequently reaching these thresholds, consider scaling your deployment by adding more Maia agent instances.
Scaling for load🔗
Horizontal scaling🔗
Each Maia agent instance is limited to 20 concurrent tasks at any one time. This is regardless of the amount of resources assigned to the Maia agent instance. As such, a high level of concurrency in your pipelines would result in tasks being queued and would result in the overall pipeline execution taking longer.
Horizontal scaling involves adding more Maia agent instances. By adding more instances, you increase the number of tasks that can run in parallel—two Maia agent instances allow 40 concurrent tasks to be executed, and so on—reducing task queuing.
The method for adding Maia agent instances will vary depending on your container orchestrator—see here for detailed instructions:
Cost implications🔗
Adding more Maia agent instances does not result in extra charges from Matillion. Our credit charges are based on task execution time. Task queuing time does not consume credits.
However, running extra containers (e.g. Maia agent instances) is likely to increase the infrastructure cost from your container orchestrator (e.g. AWS Fargate).
As such, ensuring enough Maia agent instances are available for the required performance requires balance between desired performance and infrastructure cost/budget.
Transformation tasks - low load🔗
Since transformation tasks generate SQL that is then executed by your cloud data warehouse, these tasks do not require a large amount of CPU time or memory on the Maia agent instances. With this in mind, if your workload is "transformation heavy", a smaller Maia agent (with a low number of Maia agent instances) will likely suffice.
Data ingestion and scripting - high load🔗
Components that move or ingest data—as well as those allowing the execution of customer scripts such as Python or Bash—place a high CPU and memory burden on Maia agent instances. If workloads involve a high volume data ingestion or custom scripting, you'll need to run a larger number of Maia agent instances.
Further considerations🔗
Scale up delay🔗
Once you have edited the Maia agent service to start more Maia agent instances, there is a delay of approximately 4 minutes for the Maia agent instances to start and dial back to Maia to begin accepting tasks.
Scaling AWS Maia agents from within a pipeline🔗
If using AWS ECS Fargate to run your Maia agents, you can scale up and down within a pipeline. The AWS command line tools are available within the Bash Script component and this can be used to edit the ECS service to change the desired number of instances.
Bash scripts executed in Hybrid SaaS environments obtain the IAM permissions assigned to the Maia agent. If these permissions include amending an ECS Fargate service, then a script can be used to change the number of Maia agent instances.
The below script can be used within a Bash Script orchestration component to do this:
###
# This script will alter the desired task count for a Matillion Agent
# Please set the required variables to the values seen in the AWS ECS Console
# Note: new agent instances usually take sround 4 minutes to be available for task processing
###
​
AWS_REGION=<AWS region e.g. eu-west-1>
AWS_ECS_SERVICE=<ECS Fargate Task Name>
AWS_ECS_CLUSTER=<ECS Cluster Name>
DESIRED_AGENT_COUNT=2
​
aws ecs update-service --service $AWS_ECS_SERVICE --desired-count $DESIRED_AGENT_COUNT --region $AWS_REGION --cluster $AWS_ECS_CLUSTER
This script needs the following permission in the IAM role attached to the Task Definition as the Task Role for the Maia agent:
ecs:UpdateService
You must add this permission before running the script, as it's not added by default.
Note
- New Maia agent instances take around 4 minutes to become available. Please consider this when scheduling your scaling events.
- Be aware of other pipelines or users who may be relying on the Maia agent—this resize will affect any users or pipelines using the Maia agent.
Scaling Snowflake🔗
If you're not seeing the performance expected—even when using a Matillion Full SaaS solution of sufficient size for your workloads—it might be that your Snowflake warehouse defined in the Matillion environment needs scaling. Read Monitoring Warehouse Load to learn more.
If a warehouse is overloaded with many parallel queries, the queries will queue. A large queue time shown in the Snowflake graph indicates pipeline performance will benefit from scaling the warehouse.
Matillion recommends enabling multi-cluster warehousing if this is available in your Snowflake account. Using this mechanism, Snowflake will horizontally scale the warehouse by starting and stopping instances of the warehouse automatically. Matillion has found that this improves concurrent performance in a better way than simply increasing the size of the warehouse.
Scaling Maia agent for Snowflake🔗
You can scale the number of Maia agent instances within Snowflake, which allows you to increase or decrease concurrency to better handle your pipeline workload.
Warning
Scaling down the number of Maia agent instances while pipelines are running and the Maia agent is not in a Paused state can lead to pipeline failures.
- From your Snowflake Home screen, click Data Products → Apps. You must be using the role that originally installed the application.
- Locate Matillion Maia in the list of apps, and click to select it. If you have multiple installs of the Native App, select the one you wish to scale.
- Click the Control Panel tab.
- Under Agent scaling enter the number of Maia agent instances you want to run. The maximum number of Maia agent instances is 10.
- Click Apply, then Apply again to confirm.
This will put the compute pool into a Resizing state and change the Maia agent status to Pending. The Maia agent status will change to Running after a few minutes.