Streaming to an Amazon S3 destination
Streaming pipelines can use Amazon S3 as a direct storage destination. This page describes the prerequisites, process, and other considerations of using Amazon S3 as a streaming pipeline destination.
You should have arrived here by first reading about the Create streaming pipeline wizard.
Prerequisites
- You need an AWS account.
- You need permissions to create and manage S3 buckets in AWS.
- You'll need to ensure that the IAM role used by the streaming agent has
PutObject
permissions for the S3 bucket and its prefix to be used as the destination by the pipeline.
Destination configuration
Refer to this section to complete the Destination configuration section of the Create streaming pipeline wizard.
Bucket
= string
The name of the Amazon S3 bucket you want to use as a destination. Find your bucket name in the AWS Management Console under Services → S3.
Note
This must be the base directory of the S3 bucket, it is not currently possible to use a sub-directory.
Prefix
= string
The prefix is the name of the "folder" or a location within the S3 bucket that all streaming data for this pipeline should be saved to. You can have multiple agents using the same bucket with different prefixes.
Click Continue to advance to setting up your source database connection.
Schema drift
Schema drift is supported for this destination. Read Schema drift to learn more.
Cross-account S3 access
Given that you have a streaming agent set up in one AWS account (let's call it Account A), it's possible to load your change data into an S3 bucket in another account (let's call this one Account B). To do this, you must configure the following access permissions for your agent task role in your bucket policy.
- Allow
GetBucketLocation
andListBucket
on your bucket. - Allow
PutObject
,GetObject
, andDeleteObject
on the contents of your bucket.
An example bucket policy can be seen below:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Bucket Permissions",
"Effect": "Allow",
"Principal": {
"AWS": "<account-a-agent-task-role-arn>"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": "<account-b-bucket-arn>"
},
{
"Sid": "Bucket Content Permissions",
"Effect": "Allow",
"Principal": {
"AWS": "<account-a-agent-task-role-arn>"
},
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "<account-b-bucket-arn>/*"
}
]
}
Your task role ARN can be found by navigating to your ECS clusters list in Account A and following the steps below:
- In the search, type "ECS" and choose Elastic Container Service.
-
Select your ECS cluster.
-
In the Services tab, locate and click your cluster's task definition.
-
In the Overview panel, locate and click the task role.
-
In the Summary panel, locate and copy your ARN.
Similarly, your bucket ARN can be found by navigating to your S3 buckets list (Account B), then selecting the corresponding radio button and clicking Copy ARN
.