Skip to content

Databricks

The Data Productivity Cloud extracts data from diverse data sources and loads it into a cloud data platform. Before you run any pipelines in the Data Productivity Cloud, you need a connection to a suitable cloud data platform account. This topic discusses the basics of connecting the Data Productivity Cloud to Databricks.


Full SaaS or Hybrid SaaS

Data Productivity Cloud can be run in a Full SaaS or hybrid SaaS architecture.

  • Databricks on AWS is compatible with both Full SaaS and hybrid SaaS.
  • Databricks on Azure currently only offers Full SaaS. Azure support for hybrid SaaS is coming soon.

Compute types

The Data Productivity Cloud supports the following Databricks compute types:

  • All-purpose compute (Databricks runtimes 10.4 and above are supported).
  • Classic SQL warehouses.
  • Serverless SQL warehouses (recommended).

Read Compute in the Databricks documentation for more information.


Authentication to Databricks

Currently, the Data Productivity Cloud supports these authentication methods when connecting to Databricks:

  • Username/password
  • Personal access token

Note

To use personal access token authentication in pipeline components, enter token as the username and the actual value of the token as the password.


Catalog types

We recommend using Unity Catalog enabled workspaces. The Data Productivity Cloud does support Hive catalogs, but many of its advanced features (such as Unity Catalog staging) and future features will be reliant on Unity Catalog workspaces.


Feature support

Some features will only work with specific Databricks runtimes and configurations:

Feature Minimum Databricks runtime Notes
Unity Catalog Volumes staging 13.4+
Run Notebook 10.4+ If you are using a serverless SQL or classic SQL compute, you can only run SQL notebooks using the Run notebook component.
Personal Staging 10.4+ You must be using personal access token authentication to use personal staging. This feature is being deprecated.

S3 buckets and Azure Blob storage

If you wish to load data from, or stage via, S3 buckets or Azure Blob storage, you must create and associate AWS or Azure cloud credentials to your environment.

You should also make sure that the instance profile attached to your Databricks compute resources also has access to the same AWS or Azure storage.