Databricks Jobs Compute configuration
Run and schedule Data Productivity Cloud transformation pipelines on a Databricks Jobs Compute cluster. To enable this, you must create a Jobs Compute configuration in the Data Productivity Cloud and associate it with an environment using the Cloud Platform Databricks integration. This configuration provides the necessary details for Databricks to launch the Jobs Compute cluster where your transformation pipeline will execute.
Why use Jobs Compute?
Databricks Jobs Compute clusters offer a range of benefits tailored for cost-effective data transformations.
Automated and optimized resource usage
- Ephemeral clusters: Jobs Compute clusters are typically spun up on demand and automatically terminated when the job completes. This ensures resources are only used when needed, reducing idle time.
- Cost efficiency: Since you only pay for compute while the job is running, this model can significantly lower costs compared to using always-on interactive clusters.
Environment consistency
- Repeatable execution: Each job starts in a fresh cluster environment, which helps avoid problems caused by lingering dependencies or residual data from previous runs.
- Controlled configurations: You define the Databricks Runtime version, node type, and number of workers, ensuring consistency across development, staging, and production runs.
When to use Jobs Compute
Jobs Compute is ideal for transformation pipelines that don't require low-latency or real-time execution. Since these clusters are created on demand, there's a much longer startup time compared to an all purpose cluster. This makes Jobs Compute a cost-effective choice for scheduled or batch workloads, where reduced infrastructure costs are prioritized over immediate responsiveness. It offers a reliable option for teams focused on optimizing resource usage.
Accessing Jobs Compute
The Jobs Compute feature is available at the project level. To access it:
- In the Data Productivity Cloud, go to the Your projects menu, and select a Databricks project.
- Along the top you will see a list of resource tabs. Click More on the far right, then select Job Compute. A list of your existing Jobs Compute configurations will be displayed.
This is where you can manage your Jobs Compute configurations. In this dialog, you can:
Create Jobs Compute configurations
To create Jobs Compute configurations that can later be associated with your environment, follow these steps:
- From the Jobs Compute configurations page, click the Add jobs compute button at the top.
-
The Create a jobs compute Configuration dialog will appear. Complete the following fields:
- Name: A unique name to identify the configuration.
- Access mode: Use the drop-down menu to select the access mode to be used by the cluster. For example, No isolation shared, Shared, and Single User. For more information, read Access modes.
- Worker type: Use the drop-down menu to select the size of the node type to be used by the cluster. For example, r6id.xlarge, i3.xlarge.
- Databricks runtime version: Use the drop-down menu to select the runtime version that will be used by the cluster. For example, 15.4LTS. For more information, read Databricks Runtime release notes versions and compatibility.
- Number of workers: The number of worker nodes you want Databricks to allocate for running the job. Enter a number between 0 and 100,000.
-
Click Create to save the configuration.
You will return to the Jobs Compute configurations page.
Edit Jobs Compute details
To edit an existing Jobs Compute configuration, following these steps:
- On the Jobs Compute configurations page, click the ellipses ... next to the configuration you want to modify.
- Click Edit Configuration.
- The Edit a jobs compute configuration page will be open, allowing you to update the details specified during the initial setup. For more information, read Create a Jobs Compute configuration.
- Click Update to save your changes.
You will return to the Jobs Compute configurations page.
Delete Jobs Compute configurations
To delete an existing Jobs Compute configuration, follow these steps:
- On the Jobs Compute configurations page, click the ellipses ... next to the configuration you want to delete.
- Click Delete Configuration.
-
A confirmation pop-up dialog will appear. Click Yes, delete to proceed.
Note
This action can't be undone, and will impact any environment or transformation pipeline where the Jobs Compute configuration is in use.
Once confirmed, the configuration will be removed from your list.
Associating Jobs Compute configurations with an environment
A Jobs Compute configuration must be created on the Jobs Compute configurations page before it can be associated with an environment. For more information, read Create a Jobs Compute configuration.
To associate a Jobs Compute configuration with an environment, follow these steps:
- In the Data Productivity Cloud, go to the Your projects menu.
- Along the top you will see a list of resource tabs. Click Environments.
- A list of your environments will be displayed. For more information, read Environments. Click the ellipses ... next to the environment you want to update.
- Click Associate jobs compute.
- The Environment name field will be pre-populated with your default environment. Use the drop-down menu to select the Jobs Compute configuration you want to associate.
- Click Associate.
All transformation pipelines in this environment will now run using the selected Jobs Compute configuration.