Databricks

Before you run any pipelines in the Data Productivity Cloud, you need a connection to a suitable cloud data platform account. This topic discusses the basics of connecting the Data Productivity Cloud to Databricks.

Full SaaS or Hybrid SaaS?

The Data Productivity Cloud can be run in a Full SaaS or Hybrid SaaS architecture.

Databricks on AWS is compatible with both Full SaaS and Hybrid SaaS.
Databricks on Azure is compatible with both Full SaaS and Hybrid SaaS.

Compute types

The Data Productivity Cloud supports many of the Databricks compute types. For more information, read Compute in the Databricks documentation.

Authentication to Databricks

The Data Productivity Cloud supports Personal Access Token (PAT) authentication as well as OAuth for service principals (OAuth M2M) when connecting to Databricks.

To use Personal Access Token (PAT) authentication in pipeline components, enter token as the username and the actual value of the token as the password.

To use OAuth M2M authentication in pipeline components, select the OAuth Client Credentials option and enter the client ID and client secret. Note, if you are using a hybrid SaaS project your agent version should be 10.1021.1 or greater.

Catalog types

We recommend using Unity Catalog enabled workspaces. The Data Productivity Cloud does support Hive catalogs, but many of its advanced features (such as Unity Catalog staging) and future features will be reliant on Unity Catalog workspaces.

Feature support

Some features will only work with specific Databricks runtimes and configurations:

Feature	Minimum Databricks runtime	Notes
Unity Catalog Volumes staging	13.4+
Run Notebook	10.4+	If you are using a serverless SQL or classic SQL compute, you can only run SQL notebooks using the Run notebook component.

S3 buckets and Azure Blob storage

If you wish to load data from, or stage via, S3 buckets or Azure Blob storage, you must create and associate AWS or Azure cloud credentials to your environment.

You should also make sure that the instance profile attached to your Databricks compute resources also has access to the same AWS or Azure storage.