Set up Amazon Redshift
This page is a guide to configuring your AWS account to use Amazon Redshift as a destination within Data Loader.
The Amazon Redshift Destination can be configured instantly, as part of the pipeline setup process. The ingested data is first staged in your Amazon S3 bucket before it's batched and loaded to the Amazon Redshift destination.
Prerequisites
- You must have an active AWS account.
- You must have an active Amazon Redshift instance. Instructions for creating an Amazon Redshift destination are outside the scope of this tutorial. This page presumes that you have an active instance. For help getting started with Amazon Redshift, read Amazon's Getting started with Amazon Redshift.
- Permissions in AWS to:
- Create and manage security groups, which are required to allow Matillion's IP addresses.
- View database details—required for retrieving the database's connection details.
- You must have
SELECT
privileges, or privileges that let you create users and grant privileges. Required to create a database user for Data Loader.
Step 1: Allow IP addresses
You need to allow the Matillion IP address for your region to connect to your Amazon Redshift database.
- Log in to the Amazon Redshift dashboard.
- In the left navigation pane, click Clusters.
- Click the cluster that you want to connect to Data Loader.
- In the Configuration tab, click the link text under Cluster Properties → VPC security groups to open the Security Groups panel.
- In the Security Groups panel, click Inbound, and then, click Edit.
- In the Edit inbound rules dialog box:
- Click Add Rule.
- In the Type column, select Redshift from the drop-down.
- In the Port Range column, enter the port of your Amazon Redshift cluster. The default value is
5439
. - In the Source column, select Custom from the drop-down and enter the Matillion IP addresses for your region. Repeat this step to allow all the IP addresses.
Step 2: Create a user and grant privileges
Create a user
- Connect to your Amazon Redshift instance using your SQL client.
- After connecting, run this command to create a user named "matillion". Replace
<password>
with a secure password:
CREATE USER matillion WITH PASSWORD '<password>';
Grant privileges to the user
- Log in to your Amazon Redshift database as a
superuser
. - Enter the following commands:
Grant CREATE
privilege to the database user for an existing database:
GRANT CREATE ON DATABASE <database_name> TO matillion;
GRANT CREATE ON SCHEMA <SCHEMA_NAME> TO <USER>;
GRANT USAGE ON SCHEMA <SCHEMA_NAME> TO <USER>;
Grant SELECT
privilege to all tables or specific tables:
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO matillion; #all tables
GRANT SELECT ON TABLE <schema_name>.<table_name> TO matillion; #specific table
Step 3: Retrieve the endpoint and port number
- Log in to the Amazon Redshift dashboard.
- In the left navigation pane, click Clusters.
- Select the Cluster that you want to connect to Data Loader.
- Click Properties and locate the following fields:
- Endpoint: In the Connection details pane's Endpoint field, click Copy to copy the endpoint details. You need them to complete the destination setup in Data Loader.
- Port: This is the port used by the database. The default Amazon Redshift port is
5439
.