Setup guide - Hybrid SaaS Databricks on AWS🔗

This document describes the necessary steps to follow to set up your first working project in the Data Productivity Cloud for the following configuration options:

Deployment type:

Hybris SaaS

Cloud platform:

AWS

Cloud data warehouse:

Databricks

Prerequisites🔗

AWS requirements🔗

An AWS account with privileges/permissions to use the CloudFormation template.
An AWS user account or role with permissions to create:
- ECS clusters.
- Task definitions.
- IAM roles for task execution, including as a minimum AWSServiceRoleForECS. If your account doesn't have this role, create it following the instructions in Creating a service-linked role for Amazon ECS.
- S3 buckets.
- CloudWatch log groups.
- AWS Secrets Manager.
Access to the following AWS resources:
- A virtual private cloud (VPC).
- A private subnet.
- A security group, minimally allowing access.
Allowed access to the IP addresses listed in Allowing IP addresses.

Databricks requirements🔗

A Databricks account with the following information:
- Your Databricks instance name.
- Your Databricks personal access token.

Connectivity requirements🔗

Access enabled for the IP addresses listed under the Hybrid SaaS section of Allowing IP addresses.

Git requirements🔗

If you choose to use your own Git provider instead of the Matillion-hosted Git option, you need the following:

The Matillion Git app installed in your organization's account with one of the supported Git providers:
- GitHub.
- Azure DevOps.
- GitLab.
- Bitbucket.

Setup steps🔗

Register for a Data Productivity Cloud account.
Create accounts for users and admins who will be active in the Data Productivity Cloud.
Create an agent in the Data Productivity Cloud.
Deploy a Fargate agent in AWS using CloudFormation.
- If you have multiple VPCs, or link your VPCs to on-premises environments for accessing privately hosted databases, APIs, and other data sources, the agent's VPC's CIDR and subnet's IP range should be compatible and properly linked with other networks, where required.
Create a project, making the following choices:
- Select Advanced settings.
- Select the agent you created and deployed previously.
- Select the Git provider you wish to use.
Create an environment using your Databricks credentials.
Set up secret definitions for Databricks credentials, passwords, API keys, and tokens.
Create a Git branch in which to begin pipeline work.
Create your first pipeline.

Got feedback or spotted something we can improve?

We'd love to hear from you. Join the conversation in the Documentation forum!