Quick guide for deploying a Streaming agent in AWS
Use this guide to add a Streaming agent in Data Loader and then deploy that agent in Amazon Web Services (AWS). Creating and deploying an agent are required steps to set up a CDC pipeline in Data Loader. There are multiple methods for deploying an agent. This guide will demonstrate how to do this using the following advanced template methods:
- Progress through the Quick start agent process (recommended).
- Manually download the agent template.
For best performance, your AWS region should be geographically similar to your Hub account region.
Create a Streaming agent in Data Loader
- Register and log in to the Hub.
-
The My Accounts page will be displayed, where you will see a list of the accounts you have already created or joined. At the bottom of your list of accounts, click Add new account. For more information, read Create an Account.
Each Hub account can generate one unique platform key that your Streaming agent will use to communicate with Data Loader. With this in mind, create the Streaming agent in the account that matches the platform key you want to use. For more information, read Platform Keys.
-
Click Load data on the What do you want to do today? page.
- On the Data Loader dashboard, scroll to the lower-right of the UI and choose your region.
- In the Pipelines dashboard, select Agents in the left sidebar and click Add agent.
- Give your agent a sensible Agent name and Description. Click Continue.
- Since this guide is for AWS, click AWS as your cloud provider.
- Click CloudFormation as the service you want to provision and deploy your cloud resources from, for the Streaming agent installation.
- The Agent setup page will be displayed with useful information and prerequisites you need to know before you can deploy your agent. Please also note on this page the following environment variable values:
- ID_AGENT: This value is unique per agent.
- ID_ORGANIZATION: This value is unique per agent.
- PLATFORM_WEBSOCKET_ENDPOINT: This value is unique for the Data Loader region (US or EU).
- Manage key pair: This is a required and generated value. The key pair ensures Streaming agents can communicate securely between your VPC and Matillion's Data Productivity Cloud platform. If you haven't generated a platform secret for your account yet, Data Loader will prompt you to do so when creating a CDC pipeline. You need to store this value in AWS Secrets Manager where your Streaming agent can access it. For security reasons, this key pair can only be generated and shown once per account, so make sure to copy and save it for future use. You can revisit this page if required. Follow these last steps to submit your key pair:
- Check the I have saved the private key in AWS Secrets Manager and made a note of the secret name checkbox.
- Click Submit key pair.
- If a key pair has been configured, click Return to agent list where you can begin the process of adding an agent.
- The Agent containers page will list all current agents in your Hub account, including the one you have just created. Click the Agent setup icon next to the intended agent, and refer to the following sections to Deploy the Streaming agent using AWS quick create stack or Deploy the Streaming Agent in AWS using the advanced template, respectively.
Deploy the Streaming agent using AWS quick create stack
It's recommended that you use the AWS CloudFormation ECS Fargate advanced template's quick create stack method to deploy your Streaming agent. Follow these steps to begin the deployment.
- Before continuing with the following steps, make sure you have followed the instructions in Create a Streaming agent in Data Loader.
- Click the Agent setup icon next to the agent you have just created.
- You should have already read and made note of the prerequisite information you need to continue. Choose AWS as your cloud provider, and select the CloudFormation service you want to use to provision and deploy your cloud resources. You will return to the Agent setup page. This time, scroll down to the Quick start (recommended) heading, and click the CloudFormation ECS Fargate Quick Create link.
-
If you're already logged in to your AWS account, the aforementioned link will open the Quick create stack page in the AWS console.
Make sure the region you're working in, is either
eu
orus
, depending on the Data Loader region you are building the pipeline within. In the AWS console, you must choose the same region. -
Give a unique Stack name and enter parameter information for your stack. For more information, read Quick create stack parameters.
- Once you've completed your stack's parameters, check the capabilities and tranforms checkboxes to acknowledge transforms might require certain access capabilities.
- Click Create stack.
In Data Loader, your created Streaming agent status will display as Connected and display the Add Pipeline button. To add a pipeline to your agent, refer to CDC pipeline overview.
Deploy the Streaming Agent in AWS using the advanced template
You can deploy a Streaming agent manually using the CloudFormation ECS Fargate Basic Template or the CloudFormation ECS Fargate Advanced Template. Follow these steps to deploy your Streaming agent using the downloaded CloudFormation ECS Fargate Advanced Template.
- Before continuing with the following steps, make sure you have followed the instructions in Create a Streaming agent in Data Loader.
- Click the Agent setup icon next to the agent you have just created.
- You should have already read and made note of the prerequisite information you need to continue. Choose AWS as your cloud provider, and select the CloudFormation service you want to use to provision and deploy your cloud resources. You will return to the Agent setup page. This time, scroll down to the Template heading, and click the CloudFormation ECS Fargate Advanced Template link.
- Download the CloudFormation template.
- Log in to the AWS console.
-
Navigate to the Region drop-down and select the region in which you want to deploy the Streaming agent.
Make sure the region you're working in, is either
eu
orus
, depending on the Data Loader region you are building the pipeline within. In the AWS console, you must choose the same region. -
Navigate to CloudFormation, click the Create stack drop-down menu, and select With new resources (standard). A four-step wizard for configuring your stack will be displayed.
- In step 1, select the Template is ready radio button, then select the Upload a template file radio button. Click Choose file, and upload the CloudFormation template from step 3, above. Click Next to continue.
- In steps 2 and 3 of the template creation wizard, use the drop-down menus and text fields to Specify stack details, and Configure stack options, and give an appropriate Stack name. The template you choose will autofill some of the parameter values based on your uploaded template. For more information, read CloudFormation ECS Fargate Advanced template parameters.
- Click Next to display step 4, Review [StackName]. Review the information you've entered, click the required capabilities and tranforms checkboxes to acknowledge transforms might require certain access capabilities.
- Click Submit. The stack creation will then begin and should complete in approximately five minutes.
- When the stack creation is complete, the agent container will be deployed as an AWS Elastic Container Service (ECS) Cluster. The CloudFormation Template also creates resources in IAM, S3, and CloudWatch Logs.
In Data Loader, your created Streaming agent's status will display as Connected and display the Add Pipeline button. To add a pipeline to your agent, refer to CDC pipeline overview.
Quick create stack parameters
Define the following parameters in your quick create stack template. The values from the Environment Variables table in the Agent setup document will auto populate the parameter values in the UI. Manually enter custom values for the remaining parameters.
Parameters | Description |
---|---|
Matillion Region | Select the region for resources to be created in, either europe or usa . |
VPC Id | Select the Id of an existing VPC, usually the one hosting the database. For more information, refer to Your VPCs. |
VPC Subnet Ids | Select the VPC Subnets to use. Choose at least one from a list of your chosen VPC. |
Database VPC Security Groups | Choose at least one security group associated with your database. For more information, refer to Security Groups, Resources and databases, respectively. |
Assign Agent a Public IP Address | Select Yes to assign a public IP address to the agent. Select No to route the agent through an existing NAT gateway. |
RSA Private Key | Decide to manually or automatically create a private key for your stack. Fill in the fields associated with your chosen method. For more information, read AWS Secrets manager. |
Matillion Organization Id | The value of the environment variable ID_ORGANIZATION , copied from the Agent setup page when creating your Streaming agent in Data Loader. The template will autofill this value in the UI. |
Matillion Agent Id | The value of the environment variable ID_AGENT , copied from the Agent setup page when creating your Streaming agent in Data Loader. The template will autofill this value in the UI. |
Optional Configuration | As the name suggests the following parameters are optional. Give the name of existing ECS cluster to deploy into. Leave this field blank to create a cluster for the service. Give a comma separated list of S3 Buckets to manage. |
Optional MySQL Support | These parameters are also optional. For more information, refer to MySQL Connector. |
Advanced agent IAM (access control) | Optional parameters where you can enter a comma-separated list of the following: KMS key access, Secrets Manager secret access, S3 bucket access, and S3 object access. |
CloudFormation ECS Fargate advanced template parameters
The list of parameters in the following table are all situated in the advanced template. When you Download the template from the documentation and upload it into the AWS console, all parameter fields in the UI will require custom values to be added when you create or update your AWS stack.
Parameters | Description |
---|---|
Stack Name | An unique name given to the created stack. |
AgentId | The value you copied from the environment variable ID_AGENT , copied from the Agent setup page when creating your Streaming agent in Data Loader. |
PublicIP | ENABLED if your subnet running the Streaming Agent uses internet gateway for internet access. DISABLED if your subnet uses a internet gateway for routing traffic to the internet. Note that you cannot deploy the Streaming agent on a subnet that doesn't have an internet gateway or a NAT gateway/instance. Please refer to Subnets and Security Groups for more information. |
Bucket | An arbitrary name for the new target bucket for CDC output. Must be unique. The template should autofill this value in the AWS console. |
ClusterName | An arbitrary name for the new ECS Fargate cluster that will be created from the template. This is where your agent is hosted. Must be unique. The template should autofill this value in the AWS console. |
LogRetention | Choose how many days agent logs should be kept. |
OrganizationID | The value of the environment variable ID_ORGANIZATION , copied from the Agent setup page when creating your Streaming agent in Data Loader. |
PrivateKey | The string for your RSA private key. |
PrivateKeySecretName | The string for your RSA private key secret name. |
PlatformKeyName | The name of the secret generated to hold your Platform Key. If you are following our recommended install, this will be agent-rsa. |
PlatformWebSocketEndpoint | The value of the environment variable PLATFORM_WEBSOCKET_ENDPOINT , copied from the Agent setup page when creating your Streaming agent in Data Loader. |
Region | The name of the AWS Region you want to create these resources in. Note that it's usually best to keep all your AWS resources in the same region when possible. |
RoleName | An arbitrary name for the new IAM Task Role used by the image to access AWS resources. The template should autofill this value in the AWS console. |
SecurityGroups | If there are any existing security groups that you have, select them from the drop-down menu. Otherwise, create a new security group with required outbound rules as follows. 1. When you first create a security group, it has an outbound rule that allows all outbound traffic from the resource. You can remove the rule and add outbound rules that allow specific outbound traffic only. If your security group has no outbound rules, no outbound traffic is allowed. 2. When you add inbound rules for ports 22 (SSH) or 3389 (RDP) so that you can access your EC2 instances, authorize only specific IP address ranges. If you specify 0.0.0.0/0 (IPv4) and :/ (IPv6), this enables anyone to access your instances from any IP address using the specified protocol. |
Hardware | Agent CPU and RAM Allocation based on expected workload. For more information, read Security Groups. |
ServiceName | The name of the Elastic Container Service (ECS) that you want these tasks to run under. The template should autofill this value in the AWS console. |
VPC | Id of an existing VPC, usually the one hosting the database. For more information, read VPCs. |
Subnets | If you have any existing Subnet IDs that can be used, you can select one from the drop-down menu or you can create a new one. When you create a subnet, you specify its IP addresses. Depending on the configuration of the VPC, set IPv4 only or IPv6 only. Each subnet must be associated with a route table, which specifies the allowed routes for outbound traffic leaving the subnet. Each subnet must be associated with a network ACL. Every subnet that you create is automatically associated with the default network ACL for the VPC. For more information, read our documentation on Subnets and the documentation provided by AWS on Subnets. |