Cross-account S3 access
Overview
This document describes the AWS setup needed to allow Matillion ETL to load data from an Amazon S3 file to a different AWS account. Amazon's authentication model makes this possible without the S3 owner having to make the data world-readable.
This configuration can be used whenever one AWS account needs to securely share S3 files with another.
Overview of Authorization
Resources such as S3 buckets and files are private to the owner by default. Attempting to use the S3 Load component to access a bucket belonging to another account will result in a failure.
To enable cross-account access, three steps are necessary:
- The S3-bucket-owning account grants S3 privileges to the root identity of the Matillion ETL account. This action minimizes the coupling between the accounts.
- Within the Matillion ETL account, an EC2 instance role delegates the access on to Matillion ETL, and then again to Redshift.
- As part of the Matillion environment setup, instance credentials are used by Redshift and allow it to inherit AWS permissions.
Configuring the S3-bucket-owning account
To allow selected users to access the data without actually making the data fully public, the owner of this bucket must add an authorization policy.
This account doesn't need to know anything about the IAM users in the Matillion account, but there's always a root entity, since every AWS account has one. Thus, access can be granted to the root user of the Matillion AWS account. Admins grant access by setting up a bucket policy from the Properties window of the S3 management console.
Three actions are needed in the bucket policy:
- Allow s3:ListBucket on the bucket itself.
- Allow s3:GetObject on file(s) in the bucket.
- Allow 3:GetBucketLocation on the bucket.
The above actions are included in the attached example policy file (owner-policy.txt) found at the bottom of this page. (Note this is just an example and should be used to structure your own policy file, rather than be used as-is).
ListBucket privilege is used by Matillion to validate the bucket name, and also by the Redshift bulk loader during filename prefix matching. The "Resource" of the GetObject statement can be an asterisk (allowing Matillion ETL to read any file in the bucket), or it can be a specific named pattern.
Use the Matillion/Redshift AWS account number in the policy editor and substitute your own bucket name in the example policy.
At this point, the root user of the Matillion account can list this bucket and read files. However, a second round of authorization is required to delegate this access to Matillion ETL and Redshift.
Configuring The Matillion Account
If your Matillion ETL instance has been set up with an EC2 instance role as described in Manage Credentials, then it is likely that no extra configuration is required.
If you're using coarse-grained access control, the two S3 privileges required by Matillion ETL are included in the AmazonS3FullAccess policy. If you're using fine-grained access control, the two privileges are both among those in the " Recommended Actions" list.
To recap - In case you have a different setup, you must first use the IAM console to create an EC2 service role:
The new role must be given the required privileges as described in Manage Credentials and at a minimum to include those described in the attached policy file (user-policy.txt).
Finally, use this EC2 service role as the IAM role when you launch your Matillion ETL instance:
Configuring the Matillion S3 Load component
Now that the EC2 service role is delegating the S3 access to Matillion ETL, you should be able to configure the S3 Load component in an orchestration job.
The dropdown list of the S3 URL Location property will not contain the other account's bucket, since it only lists your own buckets. You simply need to type the bucket name into the field instead:
It is possible that the other account's bucket is in a different AWS Region to Matillion ETL. In this case, the load will fail, with an error such as:
S3ServiceException: The bucket you are attempting to access must be addressed using the specified endpoint.
This error condition is deliberate: it would be possible for Matillion ETL to automatically find the region of the bucket, but moving data between regions does have an associated cost. If this happens you will also need to adjust the "Region" property, from its default of None, to the actual region where the source S3 bucket exists.
The IAM Role ARN property (visible in the screenshot) is only available in Matillion ETL for Redshift.
Redshift Authorization
Matillion ETL uses the Redshift bulk loader, and never sees any of the S3 data itself. It is Redshift that is accessing S3 and performing the load.
Since we were using an EC2 instance role, how did Redshift get permission to access the file?
The answer is in the environment settings within Matillion ETL. Setting the Credentials property to Instance Credentials allows Redshift to inherit the permissions granted to Matillion ETL.
The permission-delegation technique also works equally well with a Redshift service role. In this case, the service role must be manually associated with the Redshift cluster (under Manage IAM Roles in the Redshift console), and supply the role as credentials in the S3 Load component, for example like this: