Create Project (Delta Lake on Databricks)
A Matillion ETL project is a logical grouping of configuration settings and resources—such as jobs—required to use Matillion ETL. When users first log in to their Matillion ETL instance, they will be required to click Confirm in the Product Improvement Metrics dialog, and then they must create a project if no existing projects are available to select.
To create a new project, there are two routes:
- The first route is found in the Join Project dialog, which will appear automatically upon first loading an instance.
- The second route can be accessed by clicking Project, then clicking Switch Project, and then (for both methods) clicking Create Project.
There are no practical limits to the number of projects you can create. However, only one project is used by the client session at a time, and each project must have a unique name.
Note
These instructions assume you have already successfully launched a Matillion ETL instance.
- You can connect to a serverless cluster if desired. Refer to Use serverless SQL warehouses and Serverless compute for more information.
- For help connecting to your instance, read Accessing the Matillion ETL Client (AWS) or Accessing the Matillion ETL Client (Azure).
- For help in adding credentials, read Manage Credentials.
Creating a Delta Lake on Databricks project on AWS
The following section describes how to create a project in Matillion ETL for Delta Lake on Databricks (AWS).
1. Project Details
Complete the following details:
- Project Group: Use the drop-down menu to choose an existing project group. Projects should be logically arranged in project groups.
- Project Name: Enter a suitable name for your new project.
- Project Description: Describe your project. This is optional.
- Private Project: Select this to make this new project private. Only users granted access can view and work in this project if private.
- Include Samples: This is selected by default; clear it if you do not want to include sample jobs in this project.
2. AWS Connection
Complete the following details:
- Environment Name: Enter a name for your new Matillion ETL environment.
- AWS Credentials: Use the drop-down menu to choose credentials for the AWS cloud platform. Instance Credentials is selected by default. Click Manage to add a new set of credentials. Read Manage Credentials for more information.
3. Delta Lake Connection
Complete the following details:
:::info{title='Note'} Before completing the following steps in the Create Project dialog, you must have an AWS Databricks account. This will enable you to deploy a Databricks workspace. :::
- Workspace ID: Enter your existing Databricks workspace ID. This can be found as part of the URL of your Databricks workspace portal. Do not include
cloud.databricks.com
. - Username: Enter the username for your Databricks workspace account. Alternatively, you can enter the word "token". For more information, read How to Generate a New Databricks Token.
- Password: Enter the password for your Databricks workspace account. Alternatively, provide the Token Value.
:::info{title='Note'} The following combinations are available in the Username and Password fields:
- Set Username as the account email. Set Password as the account password.
- Set Username as the account email. Set Password as the token value.
- Set Username as "token". Set Password as the token value. :::
To test the connection you must ensure all fields in the Delta Lake Connection dialog are populated with information. Click Test when you are ready.
4. Delta Lake Defaults
Complete the following details:
- Endpoint/Cluster: Use the drop-down menu to select an existing Databricks Cluster to connect to within your Databricks workspace.
- Catalog: Use the drop-down menu to select an existing Databricks Unity Catalog to connect to.
- Database: Use the drop-down menu to select the Databricks Database to connect to.
Click Finish to create your project and environment.
Creating a Delta Lake on Databricks project on Azure
The following section describes how to create a project in Matillion ETL for Delta Lake on Databricks (Azure).
1. Project Details
Complete the following details:
- Project Group: Use the drop-down menu to choose an existing project group. Projects should be logically arranged in project groups.
- Project Name: Enter a suitable name for your new project.
- Project Description: Describe your project. This is optional.
- Private Project: Select this to make this new project private. Only users granted access can view and work in this project.
- Include Samples: This is selected by default; clear it if you do not want to include sample jobs in this project if private.
2. Cloud Connection
Complete the following details:
- Environment Name: Enter a name for your new Matillion ETL environment.
- Azure Credentials: Use the drop-down menu to choose credentials for the Azure cloud platform. Instance Credentials is selected by default. Click Manage to add a new set of credentials. Read Manage Credentials for more information.
:::info{title='Note'} Ensure your Instance Credentials are correctly configured for the required cloud platform. For example, the Azure Blob Storage Load component relies on credentials with access to Blob Storage. :::
3. Delta Lake Connection
Complete the following details:
:::info{title='Note'} Before completing the following steps in the Create Project dialog, you will be required to create a Microsoft account to sign in and access the Microsoft Azure portal. This will enable you to deploy a Databricks workspace. :::
- Workspace ID: Enter your existing Databricks workspace ID. This can be found as part of the URL of your Azure Databricks Workspace portal. Do not include
azuredatabricks.net
. - Username: The word "token" will appear as default. You do not need to change this. For more information about tokens and setting up this type of authentication for your Databricks workspace account, read Authentication using Azure Databricks personal access tokens.
- Password: Enter the Token Value for your Databricks workspace account.
To test the connection you must ensure all fields in the Delta Lake Connection dialog are populated with information. Click Test when you are ready.
4. Delta Lake Defaults
Complete the following details:
- Endpoint/Cluster: Use the drop-down menu to select an existing Databricks Cluster to connect to within your Databricks Workspace.
- Catalog: Use the drop-down menu to select an existing Databricks Unity Catalog to connect to.
- Database: Use the drop-down menu to select the Databricks Database to connect to.
Click Finish to create your project and environment.
Cluster states
In Matillion ETL, each cluster in the Cluster drop-down menu is assigned a state, with a Databricks equivalent. See the table below for more information:
Matillion ETL | Databricks |
---|---|
Stopped | erminated, Stopped, Terminating, Stopping, Deleting |
Starting up | Pending, Restarting, Starting |
Running | Started, Running, Resizing |
Error | Unknown, Deleted, Error |
When a cluster is not running, databases won't be retrieved, and the Database drop-down menu won't offer any selections. Attempting to select a database on a cluster that displays a Stopped state will automatically trigger a cluster to start, but it can take a few minutes for the intended cluster to move from Stopped to Running, and it will be in the Starting state during this time.
Clicking Previous and returning to Databricks Defaults will refresh and update the state of the clusters, and is a required action to show when a cluster has transitioned from Stopped to Starting up, or Running. Refreshing and updating the state of the clusters will also reload the Database drop-down menu.
Next steps
When you first log in to Matillion ETL, we recommend you replace your default username and password with your own secure login credentials. For more information about changing these credentials, read User Configuration in the Admin Menu.
Useful Links
- If you're new to Matillion ETL, we recommend reading the documentation on UI and Basic Functions.
- For more information about connecting to your instance, read Accessing the Matillion ETL Client (AWS) or Accessing the Matillion ETL Client (Azure).
- For more information about adding credentials, read Manage Credentials.