Databricks job compute configuration

You can run Matillion ETL transformation jobs on a Databricks job compute cluster. To do this, you must create a job compute configuration in Matillion ETL, which contains the information that will be passed to Databricks to allow it to spin up the job compute cluster that your transformation job will run on.

Several different configuration profiles can be created, and selected at the time the job is run. These configuraton profiles are managed through the Job Compute Configuration option on the Matillion ETL Project menu.

Note

You must configure a Databricks Environment before creating a job compute configuration.

Manage job compute configurations

On the Project menu, click Job Compute Configuration. This will open the Manage Jobs Compute Configurations dialog, which displays a list of the configurations you have created. In this dialog, you can:

Add a new job compute configuration.
Delete a configuration by clicking the X next to it.
Edit the details of a configuration by clicking the pencil icon next to it.

Create a job compute configuration

On the Project menu, click Job Compute Configuration.
On the Manage Jobs Compute Configurations dialog, click + to open the Create Configuration dialog.
Enter the following:
- Name: A unique name to identify the configuration.
- Notebook Path: The path to a Databricks notebook. This path will be used to generate a notebook with unique name and will be deleted when the job run is completed. Click + and use the Workspace list to browse for and select a valid path. When selected, click OK.
- Number of Workers: The number of worker nodes you want to Databricks to allocate to running the job. Enter a number between 0 and 100000.
- Spark Version: Select the Spark version that will be used by the cluster.
- Node Type: Select the node type to be used by the cluster.
- Data Security Mode: Select the security mode to be used.
Click Generate Configuration. This will put the configuration details into JSON format for passing to Databricks. This JSON is displayed in the Configuration field, allowing you to verify it shows the expected details.
Click OK to save the configuration.

Run transformation jobs on Databricks job compute

You must create at least one job compute configuration before running a transformation job on Databricks job compute.

To run the job:

Open the transformation job on the Matillion ETL job canvas.
Right-click on the canvas and select Run Job in Databricks Compute.
In the Run Job dialog, select the required configuration from the Databricks Compute Configuration drop-down.
Click OK.