Skip to content

Streaming to an Amazon S3 destination

Streaming pipelines can use Amazon S3 as a direct storage destination. This page describes the prerequisites, process, and other considerations of using Amazon S3 as a Streaming pipeline destination.

You should have arrived here by first reading Create a Streaming pipeline.


Prerequisites

  • You need an AWS account.
  • You need permissions to create and manage S3 buckets in AWS.
  • You'll need to ensure that the IAM role used by the Streaming agent has PutObject permissions for the S3 bucket and its prefix to be used as the destination by the pipeline.

Destination configuration

Refer to this section to complete the Destination configuration section of the Create Streaming pipeline screen.

Bucket = string

The name of the Amazon S3 bucket you want to use as a destination. Find your bucket name in the AWS Management Console under ServicesS3.

Note

This must be the base directory of the S3 bucket, not a sub-directory.


Prefix = string

The prefix represents the "folder" within the S3 bucket where all streaming data for this pipeline is saved. Using a unique prefix avoids naming conflicts and allows the same bucket to be reused for multiple pipelines.

This completes the destination setup. The next groups of properties on this screen are for source setup and pipeline configuration.


Schema drift

Schema drift is supported for this destination. Read Schema drift to learn more.


Cross-account S3 access

If you have a Streaming agent set up in one AWS account (referred to as Account A), you can load change data into an S3 bucket in another AWS account (referred to as Account B). To enable this, configure the following access permissions for your agent task role in the bucket policy:

  • Allow GetBucketLocation and ListBucket on your bucket.
  • Allow PutObject, GetObject, and DeleteObject on the contents of your bucket.

An example bucket policy is shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Bucket Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "<account-a-agent-task-role-arn>"
            },
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Resource": "<account-b-bucket-arn>"
        },
        {
            "Sid": "Bucket Content Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "<account-a-agent-task-role-arn>"
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "<account-b-bucket-arn>/*"
        }
    ]
}

Your task role ARN can be found by navigating to your ECS clusters list in the AWS Console of Account A and following the steps below:

  1. In the search, type "ECS" and choose Elastic Container Service.
  2. Select your ECS cluster.
  3. In the Services tab, locate and click your cluster's task definition.
  4. In the Overview panel, locate and click the task role.
  5. In the Summary panel, locate and copy your ARN.

Similarly, your bucket ARN can be found by navigating to your S3 buckets list (Account B), then selecting the corresponding radio button and clicking Copy ARN.