Environments
An environment in Matillion ETL describes how your instance connects to a data warehouse, as well as which sets of cloud platform credentials to use. Multiple environments can be set up at a user level, meaning different users can use different environments on a single Matillion ETL instance. When a user runs a job, the job runs within the environment currently in use by that user.
For Hub users, the maximum number of environments per instance is 9999.
Before starting any job within Matillion ETL, you must set up an environment as described on this page.
Creating and editing environments
To create a new environment, click Project → Add Environment. This will open the Create Environment wizard.
All environments defined in the current Matillion ETL instance are listed in the Environments panel in the lower left of the UI. The currently selected environment is labelled [Selected].
To switch to a different environment, right-click the desired environment and then click Select Environment.
To edit an environment's details, right-click the environment name and then click Edit Environment.
You can also add a new environment from here by right-clicking an environment and then clicking Add Environment.
The Create Environment and Edit Environment wizards both show the same details. Each has three pages (except for Matillion ETL for BigQuery, which has a single page), as follows:
- Cloud Platform Connection: Specific cloud platform credentials used by this environment.
- Data Warehouse Connection: Connection details that allow Matillion ETL to use your data warehouse (Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics, or Delta Lake on Databricks).
- Data Warehouse Defaults: Default resources for Matillion ETL to use, such as buckets, schemas, and databases.
Each of these wizard pages is described in full further down this page.
Note
The Create Environment and Edit Environment wizards look slightly different depending on your Matillion ETL product version, as it depends on the native cloud platform and data warehouse in use. This page lists the details that need to be completed in these wizards for each specific Matillion ETL product.
Cloud platform connection
This section covers which cloud platform credentials your Matillion ETL instance will use. By default, Matillion ETL will attempt to use the instance credentials to discover available resources. However, you can define alternative credentials within Matillion ETL. Read Manage Credentials for more information.
- Environment Name: A unique name to identify the environment, e.g. "Test" or "Live". This is mandatory.
- AWS Credentials: Choose a set of AWS Credentials to use with this environment.
- GCP Credentials: Choose a set of GCP Credentials to use with this environment.
- Azure Credentials: Choose a set of Azure Credentials to use with this environment.
You only need to enter the credentials for the platform you are using, leaving the other two blank.
From this page you can click Manage to open the Manage Credentials dialog to edit existing credentials or create new credentials.
Click Next to move to the second page of the wizard, Data Warehouse Connection.
Google BigQuery connection
For Google BigQuery connections, this screen has two additional fields:
- Default Project: Select a Google Cloud project from the drop-down list.
- Default dataset: When you have selected a project, the drop-down list for this field will be populated with the datasets available to that project. Select the required dataset.
After entering the connection details, you can check that the connection to BigQuery is established by clicking Test.
Data warehouse connection
In this section, you must add connection and authorization details for your data warehouse. An environment can only be configured to use a single data warehouse, and the type of data warehouse is limited by your Matillion ETL product version (e.g. Matillion ETL for Snowflake can only use Snowflake as a data warehouse). The details that must be completed on this second page of the wizard depend on the data warehouse you are using, as follows.
Snowflake connection
-
Account: The name of your Snowflake account. More information about gathering Snowflake account details can be found in Create Project (Snowflake).
Warning
Matillion ETL alters Snowflake account names that include an underscore and replaces the underscore with a hyphen. Please use hyphens instead of underscores in your Snowflake account name.
The following is a direct quote from the Snowflake documentation on this topic:
URLs that include underscores can sometimes cause issues for certain features, such as Okta SSO/SCIM. For this reason, Snowflake also supports a version of the account name that substitutes the hyphen character (-) in place of the underscore character. For example both of the following URLs are supported:
URL with underscores:
https://acme-marketing_test_account.snowflakecomputing.com
URL with dashes:
https://acme-marketing-test-account.snowflakecomputing.com
-
Username: Username for the environment connection.
- Password Type: Choose between Password or Private Key to authenticate your Snowflake connection.
- Password or Passphrase: Depending on the chosen Password Type, enter a password or a private key passphrase for the environment connection. The password is selected from those stored using the Password Manager feature, which you can access by clicking Manage.
- Advanced Connection Settings: This allows you to enter additional JDBC connection parameters if required. Click Manage to open the JDBC Connection Attributes dialog, where you can enter parameters as name:value pairs.
Delta Lake on Databricks connection
- Workspace: The full URL you use to access your Databricks portal, for example
https://yourcompany.cloud.databricks.com
. - Username: A username used to access your Delta Lake resources.
- Password: The password for the environment connection. The password is selected from those stored using the Password Manager feature, which you can access by clicking Manage.
After entering the connection details, you can check that the connection to Delta Lake is established by clicking Test.
Amazon Redshift connection
- Redshift Cluster: If you have created an IAM Role, the drop-down will list the Amazon Redshift clusters that are visible to that role in the region where the instance is running. Selecting that cluster will populate many of the connection settings fields on this screen with sensible defaults. If no cluster is selected, these fields will need to be set manually.
- Endpoint: The physical address of the leader node. This will be either a name or an IP address.
- Port: This is usually
5439
or5432
, but it can be configured differently when setting up your Amazon Redshift cluster. - Username: The username for the environment connection.
- Password: The password for the environment connection. The password is selected from those stored using the Password Manager feature, which you can access by clicking Manage.
- Enable SSL: Select this to encrypt communications between Matillion ETL and Amazon Redshift. Some Amazon Redshift clusters may be configured to require this.
- Advanced Connection Settings: This allows you to enter additional JDBC connection parameters if required. Click Manage to open the JDBC Connection Attributes dialog, where you can enter parameters as name:value pairs.
Azure Synapse Analytics connection
- Endpoint: The server name of your Microsoft SQL database.
- Port: The port number of your Microsoft SQL database (
1433
by default). - Username: A username used to access your Microsoft SQL database.
- Password: The corresponding password for the above username. The password is selected from those stored using the Password Manager feature, which you can access by clicking Manage.
- Advanced Connection Settings: This allows you to enter additional JDBC connection parameters if required. Click Manage to open the JDBC Connection Attributes dialog, where you can enter parameters as name:value pairs. Read Setting the connection properties for more information.
Data warehouse defaults
In this section, you specify the resources from your cloud data warehouse that Matillion ETL should use by default.
Snowflake defaults
- Default Role: Select a role. For more information, read Overview of Access Control.
- Default Warehouse: Select the environment's default Snowflake warehouse. For more information, read Overview of Warehouses.
- Default Database: Select the environment's default Snowflake database. For more information, read Databases, Tables, and Views.
- Default Schema: Select the Environment's default schema. For more information, read the Snowflake documentation.
- Concurrent Connections: Specify the maximum number of database connections to allow concurrently. Multiple concurrent connections can potentially speed up large Matillion ETL jobs. The maximum number of connections possible is equal to the number of threads the instance has at its disposal. See our short video tutorial for more details. This is an Enterprise Mode only feature.
After entering the connection defaults, you can check that the connection to Snowflake is established by clicking Test.
Delta Lake on Databricks defaults
- Endpoint/Cluster: The Databricks cluster that Matillion ETL will connect to.
- Catalog: Choose an appropriate Databricks Unity Catalog from the drop-down menu.
- Database: The name of the Delta Lake database that Matillion ETL will connect to.
- Concurrent Connections: Select or enter the number of concurrent connections per job, to a maximum of 16.
Amazon Redshift defaults
- Database: The database you created when setting up your Amazon Redshift cluster. You may run with multiple database names—in which case, choose the one you want to use for this environment.
- Default Schema: This is public by default, but if you have configured multiple schemas within your Amazon Redshift database, you should specify the schema you want to use.
- Default Bucket: Specify the default S3 bucket to use. Many Matillion ETL components and features will use this bucket by default, but you can specify alternative buckets when you configure individual components.
- Concurrent Connections: Specify the maximum number of database connections to allow concurrently. Multiple concurrent connections can potentially speed up large Matillion ETL jobs. The maximum number of connections possible is equal to the number of threads the instance has at its disposal. See our short video tutorial for more details. This is an Enterprise Mode only feature.
After entering the connection defaults, you can check that the connection to Redshift is established by clicking Test.
Azure Synapse Analytics defaults
- Default Database: The database you created when setting up Azure Synapse Analytics. You may run with multiple database names—in which case, choose the one you want to use for this environment.
- Default Schema: Select the database schema you want to use by default.
- Concurrent Connections: Specify the maximum number of database connections to allow concurrently. Multiple concurrent connections can potentially speed up large Matillion ETL jobs. The maximum number of connections possible is equal to the number of threads the instance has at its disposal. See our short video tutorial for more details. This is an Enterprise Mode only feature.
After entering the connection defaults, you can check that the connection to Synapse is established by clicking Test.