Skip to content

Setup guide - Hybrid SaaS Databricks on AWS

This document describes the necessary steps to follow to set up your first working project in the Data Productivity Cloud for the following configuration options:


Prerequisites

AWS requirements

Databricks requirements

Connectivity requirements

Git requirements

If you choose to use your own Git provider instead of the Matillion-hosted Git option, you need the following:


Setup steps

  1. Register for a Data Productivity Cloud account.
  2. Create accounts for users and admins who will be active in the Data Productivity Cloud.
  3. Create an agent in the Data Productivity Cloud.
  4. Deploy a Fargate agent in AWS using CloudFormation.
    • If you have multiple VPCs, or link your VPCs to on-premises environments for accessing privately hosted databases, APIs, and other data sources, the agent's VPC's CIDR and subnet's IP range should be compatible and properly linked with other networks, where required.
  5. Create a project, making the following choices:
    • Select Advanced settings.
    • Select the agent you created and deployed previously.
    • Select the Git provider you wish to use.
  6. Create an environment using your Databricks credentials.
  7. Set up secret definitions for Databricks credentials, passwords, API keys, and tokens.
  8. Create a Git branch in which to begin pipeline work.
  9. Create your first pipeline.