Skip to content

Launching a Matillion ETL HA Cluster via AWS

This page is a tutorial for creating a new Matillion ETL clustered instance using a CloudFormation template.

This tutorial uses Amazon Web Services (AWS) as the cloud provider and Snowflake as the cloud data warehouse. The same steps apply if Amazon Redshift is chosen as the cloud data warehouse.


Selecting a Clustered Enterprise CloudFormation template

  1. Log in to the Hub, and choose the account you want to work in.
  2. What do you want to do today? will be displayed. Click the Add Matillion ETL instance link.
  3. Select your cloud provider. In this tutorial, AWS is selected.
  4. Select your cloud data platform. In this tutorial, Snowflake is selected.
  5. On the How do you want to deliver Matillion ETL for page, choose CloudFormation Template.
  6. On the In which region do you want to run Matillion? page, choose your preferred AWS region.
  7. On the How do you want to deploy a Virtual Private Cloud (VPC)? page, choose your preferred option. For this tutorial, Deploy to an existing VPC in my AWS environment is selected. Choose Set up a new VPC if you want to create a new VPC and new subnets.
  8. On the Choose a Matillion CloudFormation template page, choose Clustered Enterprise.
  9. On the Thank you. We will redirect you to AWS page, you can invite team members to assist with the configuration, or choose Continue in AWS to begin creating the AWS stack.

You will then be redirected to the AWS console to finish the configuration of your HA cluster.


Configuring your HA cluster in the AWS console

This section of the tutorial focuses on the Quick create stack page in the AWS console after you have redirected to AWS from the Hub.

Template

AWS will automatically designate a template URL and stack description to the new stack, based on the metadata defined by each choice made earlier in the Hub.

Stack name

AWS will supply a name for the stack, but you can edit this. Stack names can include letters (A-Z and a-z), numbers (0-9), and dashes (-). The stack name must be unique.

Parameters

Parameters are defined in your template and allow you to input custom values when you create or update a stack.

Parameter Description
Instance Configuration
Instance Type Matillion instance size. Larger sizes allow for running more concurrent tasks, See (https://www.matillion.com/pricing/) for more info.
Networking and Security Configuration Keypair Name
VPC Id The VPC in which to create security groups. This must be the VPC containing the subnet(s).
Primary Subnet An existing public subnet to launch the Matillion ec2 instance(s) into.
Secondary Subnet A secondary existing public subnet to launch the Matillion ec2 instance(s) into.
Private Subnets Select two or more private subnets across multiple availability zones (AZs) for use by secondary resources, e.g. Postgres Failover.
Security Group The security group to associate with the Matillion ETL instance(s). It should have at least ports 80 or 443 available, plus 22 for SSH, and 5701 for clustering.
ALB Configuration
DNS Prefix Load balancer DNS name prefix. Example: [matillion]-1731869672.eu-west-1.elb.amazonaws.com
Security Group IPv4 CIDR Inbound IPv4 CIDR range for application load balancer
RDS Repository Configuration
Master Username Initial Postgres username. This user will be an admin role.
Master Password Postgres password. Must contain one uppercase letter, one lowercase letter, at least one digit (0-9). Can't contain spaces, quotes, @, and slash characters.
Port Specify the TCP/IP port that the DB instance will use for application connections.
Database Name The Postgres database in which Matillion ETL will store its metadata repository.
Instance Class Database instance class size.
Storage Size The size of the database in gigabytes.
Matillion ETL Realm Configuration
For help with setting up LDAP realm configuration, read LDAP Integration.
Username Connection username. Example: administrator@INTERNAL.DOMAIN.COM
Connection Password The password for the connection username used for the initial bind.
URL The URL to your directory server. Example: ldap://10.10.10.254:389
User Base The subtree below which users are stored in the directory tree. Example: cn=Users,dc=INTERNAL,dc=domain,dc=com
User Search The LDAP attribute to use for identifying users. Example: sAMAccountName={0}
Role Base The subtree below which groups are stored in the directory tree. Example: cn=Groups,dc=INTERNAL,dc=domain,dc=com
Role Name The LDAP attribute used to identify a group or role. Example: cn
Role Search The LDAP attribute to use to identify groups or roles. Example: member={0}
User Subtree Sets the scope of the search. Select true if you wish to search the entire subtree, rooted at the "User Base" entry. Selecting false (default) requests a lone top-level search.
Login Role The name of an existing group in the directory server whose users will be allowed to log in. Role names are case-sensitive.
Admin Role The name of an existing group in the directory server whose users will be allowed to administer Matillion. Role names are case-sensitive.
Project Admin Role The name of an existing group in the directory server whose users will be allowed to administer Matillion Projects. Role names are case-sensitive.
API Role The name of an existing group in the directory server whose users will be allowed to administer Matillion. Role names are case-sensitive.
MatillionProduct Confirm the target data warehouse. This is auto-populated with the data warehouse you selected in the Hub.

Once you have added all parameters, tick the I acknowledge that AWS CloudFormation might create IAM resources box under the Capabilities heading.

Click Create stack. If any fields need to be re-validated, the console will provide information at the top of the Quick create stack page. If the stack creation form is complete, the AWS console will redirect to your newly created stack and land on the Events tab of the stack.

Events tab


Launching your new Matillion ETL HA cluster

  1. Click the Resources tab of your stack. There should be two ec2 instances visible.
  2. Click the Outputs tab of your stack and click on the URL that corresponds to the ALB URL. This URL will launch Matillion ETL in a new browser tab.
  3. Log in to Matillion ETL using your credentials.
  4. If you are a Hub customer, you are required to associate your Matillion ETL instance with the Hub. For more information, read Associating a Matillion ETL Instance.

    Note

    Only one node (VM) in the cluster is required to be associated to the hub, because the hub license is stored in the same database, and shared amongst the nodes. Each node will still require Hub access so that credit usage can be reported.

  5. Create a new project, read Create Project for more information.

  6. You can confirm that you are logged into a clustered Matillion ETL instance by navigating to the lower-right panel and clicking Cluster Info. This tab is only available in clustered instances. It displays information about the address of each ec2 instance, its current state, how many jobs it has, and its creation timestamp.

    Cluster Info tab

    Note

    • User Configuration is not available in the Admin menu on a clustered instance.
    • Clicking Server Log in the Admin menu won't open the standard server log, but instead open AWS CloudWatch.

Securing your new Matillion ETL HA cluster

We recommend that firewall and VPC configurations should be used to only allow network connections from expected host machines. This applies to all network deployments, as well as to Matillion ETL. With regards to HA clusters specifically:

  • The persistence database store only needs to accept incoming connections from the Matillion ETL nodes, and occasionally for designated database maintenance works. No other incoming network connections are required.
  • The Matillion ETL nodes only need to accept HTTPS connections from the load balancer, and SSH connections from hosts within your own infrastructure. No other incoming network connections are required.