Skip to content

Connect to Amazon S3

Matillion CDC can load data from your pipelines into an Amazon S3 bucket for storage. Follow the steps on this page to configure Amazon S3 as a destination.


Amazon S3 prerequisites

To start using Amazon S3 as a destination, some requirements need to be met. These prerequisites ensure that a working connection can be set up to transfer data to your Amazon S3 bucket.

  • An Amazon Web Services (AWS) account. Signing up is free - click here to create an account if you don't have one already.
  • Permissions to create and manage S3 buckets in AWS. Your AWS user must be able to create a bucket if one doesn't already exist, add/modify bucket policies, and upload files to the bucket.
  • The IAM role used by the Agent container has putObject permissions for the S3 bucket and its prefix to be used as the destination by the pipeline.
  • An active Amazon S3 bucket.
  • A unique prefix name for each pipeline.

Connecting to Amazon S3

Select Destination

  • After you configure the source during a CDC pipeline creation, you will be directed to choose a destination you would like to load your data into.
  • In the Choose destination page, select Amazon S3.

Configure Amazon S3 Connection Settings

Specify the following settings in the Connect to AmazonS3 Destination page:

Property Description
Bucket This refers to the name of the Amazon S3 bucket you want to use as a destination. Find your bucket name in the AWS Management Console under ServicesS3.
Prefix Prefix is the name of the 'folder' or a location within the S3 bucket that all CDC data for this pipeline should be saved to. You can have multiple agents using the same bucket with different prefixes.

Note

A pipeline prefix must be unique.

Test Connection

You can test your connection by clicking Test connection. If the test is successful, click Test and Continue.


Cross Account S3 access

Given that you have a CDC agent set up in Account A, it's possible to load your change data into an S3 bucket in Account B.

To do this, you must configure the following access permissions for your agent task role in your bucket policy.

  • Allow GetBucketLocation and ListBucket on your bucket.
  • Allow PutObject, GetObject, and DeleteObject on the contents of your bucket.

An example bucket policy can be seen below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Bucket Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "<account-a-agent-task-role-arn>"
            },
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Resource": "<account-b-bucket-arn>"
        },
        {
            "Sid": "Bucket Content Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "<account-a-agent-task-role-arn>"
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "<account-b-bucket-arn>/*"
        }
    ]
}

Your task role ARN can be found by navigating to your ECS clusters list (Account A) and following the steps below:

  1. In the search, type "ECS" and choose Elastic Container Service.
  2. Select your ECS cluster.

    Finding Task Role ARN: Step 1

  3. In the Services tab, locate and click your cluster's task definition.

    Finding Task Role ARN: Step 2

  4. In the Overview panel, locate and click the task role.

    Finding Task Role ARN: Step 3

  5. In the Summary panel, locate and copy your ARN.

    Finding Task Role ARN: Step 4

Similarly, your bucket ARN can be found by navigating to your S3 buckets list (Account B), selecting the corresponding radio button and clicking Copy ARN:

Finding Bucket ARN