Snowflake GCP Storage Integration setup guide
This topic covers how to set up a storage integration for use in Snowflake instances on the Google Cloud Platform (GCP). Matillion ETL for Snowflake on GCP requires users to select a storage integration when configuring data staging components.
A storage integration is a named Snowflake object that removes the need to pass explicit cloud provider credentials such as secret keys or access tokens. An integration object references a Google Cloud Storage service account.
- Completing the instructions below requires access to your Google Cloud Storage project as a project editor. If you aren't a project editor, please contact your Cloud Storage administrator for assistance.
- Only users with the Snowflake role ACCOUNTADMIN, or a role with the global CREATE INTEGRATION privilege, can execute the SQL commands mentioned in this topic.
- Any identifiers not enclosed in double quotes ( " " ) aren't case sensitive. The example below includes this for demonstrative purposes; the created integration
DocsTeamis passed by Snowflake as
Create a storage integration in Snowflake
- Log in to your Snowflake account.
In either an existing Snowflake worksheet or a new worksheet, use the
CREATE STORAGE INTEGRATIONcommand. The form of the command is as follows:
CREATE STORAGE INTEGRATION <integration_name> TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = GCS ENABLED = TRUE STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') [ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]
This uses the following variables:
<integration_name>The name of the new integration.
<bucket>The name of a Google Cloud Storage bucket that stores your data files.
<path>An optional file path used to provide granular control over objects in the Cloud Storage bucket.
STORAGE_BLOCKED_LOCATIONSlet you specify any buckets that you wish to restrict access to, or block access from, respectively. As shown in the above template, locations are enclosed in single quotation marks, and multiple locations separated by commas. The
STORAGE_BLOCKED_LOCATIONSparameter is optional.
The following example shows that the command for the integration
DocsTeamhas been run successfully and the storage integration
DOCSTEAMhas been created.
This command will create a storage integration and explicitly limit external stages that use this integration to reference the specified bucket (the bucket name is blurred for security purposes).
Additional stages that also use this integration can reference the permitted bucket.
In your Snowflake worksheet, execute the
GRANTcommand to set the usage of this particular integration to PUBLIC.
GRANT usage on integration <integration_name> to ROLE Public
In your Snowflake worksheet, execute the Describe Integration command to retrieve the ID for the Cloud Storage service account that was automatically created for your Snowflake account.
You can abbreviate DESCRIBE to DESC when writing this command, and the term
DESC [STORAGE] INTEGRATION <integration_name>
After executing the command, the results will look like this:
After you run the
DESCcommand, navigate to the row labelled STORAGE_GCP_SERVICE_ACCOUNT in the property column. From the
property_valuecolumn of this row, retrieve the ID (the string the red arrow in our image points at) for this Cloud Storage service. Copy this ID string, as you will require it later.
As Snowflake clarifies in their documentation, they provision a single Cloud Storage service account across your entire Snowflake account. All Cloud Storage integrations use that service account.
Granting service account permissions to access bucket objects
Follow the steps below to configure IAM access permissions in your Google GCP Console to use a Cloud Storage bucket to load and unload data:
- Log in to the Google Cloud Platform console.
- Click IAM & admin.
- Click Roles from the drop-down menu.
- Click + CREATE ROLE.
- Complete the Title, Description, and ID fields.
- Click + ADD PERMISSIONS.
Add the following permissions:
The image below provides an example.
Click Create (or Update if you are editing a Role).
Assigning the custom role to the Cloud Storage service account
- Click the menu button in the top left of the GCP Console.
- Click Storage, then Buckets.
- On the Buckets page, search and select your Cloud Storage bucket or create a new one by clicking Create at the top.
- Inside your bucket's details page, click the Permission tab.
- Click the blue +GRANT ACCESS button in the already opened VIEW BY PRINCIPALS tab.
- The Grant access to
form will be displayed. Paste the value of your STORAGE_GCP_SERVICE_ACCOUNT ID obtained during step 5 of Create a storage integration in Snowflake, as a new principal in the New Principals text box. For more information, read Learn more about Principals in IAM.
Under the Assign Roles heading, click the Role drop-down menu, and select Cloud Storage, then Storage Admin.
Optionally, you can add an IAM CONDITION, where you can choose to grant access to principals, only when the conditions you specify are met. For more information, read IAM Conditions.
Using the integration in Matillion ETL
- In Matillion ETL, create a new orchestration job, and drag your chosen data staging component onto the canvas.
- Click on the data staging component, then click its Properties tab and set the following properties:
- Staging: Select Existing Google Cloud Storage Location. The alternative, Snowflake Managed, is the default setting and will configure sensible defaults automatically.
- GCS Staging Area: Select your Cloud Storage bucket.
- Storage Integration: Select your newly created integration.
If you require assistance creating a storage integration in Snowflake, or with any other aspect of Matillion ETL, visit our Getting Support page.
Consult the Snowflake documentation to read deeper into these topics, for example, to learn how to modify an existing stage to use a storage integration.