Streaming to an Amazon S3 destination
Streaming pipelines can use Amazon S3 as a direct storage destination. This page describes the prerequisites, process, and other considerations of using Amazon S3 as a Streaming pipeline destination.
You should have arrived here by first reading Create a Streaming pipeline.
Prerequisites
- You need an AWS account.
- You need permissions to create and manage S3 buckets in AWS.
- You'll need to ensure that the IAM role used by the Streaming agent has
PutObject
permissions for the S3 bucket and its prefix to be used as the destination by the pipeline.
Destination configuration
Refer to this section to complete the Destination configuration section of the Create Streaming pipeline screen.
Bucket
= string
The name of the Amazon S3 bucket you want to use as a destination. Find your bucket name in the AWS Management Console under Services → S3.
Note
This must be the base directory of the S3 bucket, not a sub-directory.
Prefix
= string
The prefix represents the "folder" within the S3 bucket where all streaming data for this pipeline is saved. Using a unique prefix avoids naming conflicts and allows the same bucket to be reused for multiple pipelines.
This completes the destination setup. The next groups of properties on this screen are for source setup and pipeline configuration.
Schema drift
Schema drift is supported for this destination. Read Schema drift to learn more.
Cross-account S3 access
If you have a Streaming agent set up in one AWS account (referred to as Account A), you can load change data into an S3 bucket in another AWS account (referred to as Account B). To enable this, configure the following access permissions for your agent task role in the bucket policy:
- Allow
GetBucketLocation
andListBucket
on your bucket. - Allow
PutObject
,GetObject
, andDeleteObject
on the contents of your bucket.
An example bucket policy is shown below:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Bucket Permissions",
"Effect": "Allow",
"Principal": {
"AWS": "<account-a-agent-task-role-arn>"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": "<account-b-bucket-arn>"
},
{
"Sid": "Bucket Content Permissions",
"Effect": "Allow",
"Principal": {
"AWS": "<account-a-agent-task-role-arn>"
},
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "<account-b-bucket-arn>/*"
}
]
}
Your task role ARN can be found by navigating to your ECS clusters list in the AWS Console of Account A and following the steps below:
- In the search, type "ECS" and choose Elastic Container Service.
- Select your ECS cluster.
- In the Services tab, locate and click your cluster's task definition.
- In the Overview panel, locate and click the task role.
- In the Summary panel, locate and copy your ARN.
Similarly, your bucket ARN can be found by navigating to your S3 buckets list (Account B), then selecting the corresponding radio button and clicking Copy ARN
.