Skip to content

Technical requirements

This article outlines the current technical requirements and limitations for using Batch loads on Data Loader.


Snowflake (AWS or Azure)

  • Data Loader doesn't support SSH tunneling or PrivateLink, so you must either have a publicly accessible Snowflake account or set up an SSH host that's publicly accessible and can forward traffic to the customer private cloud that has the PrivateLink to Snowflake setup. Our recommendation would be to use a publicly available Snowflake account if available.
  • Snowflake username and password for the Snowflake instance used during testing.
  • Authentication for any third-party data sources. On configuring the data source, you will be prompted to grant Data Loader access to your data source and you're free to choose which account you use during that authorization. This could be:
    • Usernames/passwords for JDBC-accessible databases.
    • OAuth for most others.

Amazon Redshift

  • A shared job won't extract columns that contain upper case letters. You may see an error message regarding NULL values or the job may still complete, but with NULL values in place of the data that wasn't extracted. To resolve this issue, you must set the Redshift parameter, enable_case_sensitive_identifier to True. You can do this by altering the user, or updating the Redshift Parameter Group. See here for examples.
  • Data Loader doesn't support SSH tunneling or PrivateLink, so you must either have a publicly accessible Redshift cluster or set up an SSH host that's publicly accessible and can forward traffic to the customer private cloud Redshift is running inside.
  • Our recommendation would be to use a separate Amazon Redshift cluster (a single dc2.large node should be sufficient) for the purposes of testing; Matillion will reimburse reasonable charges incurred on submission of an AWS bill detailing the cluster used. Please don't test on 50 8XL nodes.
  • An AWS access and secret key relating to an existing IAM role that can read/write to S3. S3 is used as a staging area and although no objects will be left behind permanently, we need to read/write to S3 objects temporarily during processing.
  • Amazon Redshift username and password for the Redshift instance used during testing.
  • Authentication for any third-party data sources. On configuring the data source, you will be prompted to grant Matillion access to your data source and you are free to choose which account you use during that authorization. This could be:
    • Usernames/passwords for JDBC-accessible databases.
    • OAuth for most others.

Google BigQuery

  • Data Loader requires a Google Service Account which is configured to be able to use Google BigQuery and Google Cloud Storage. Google Cloud Storage (GCS) is used as a staging area and, although no objects will be left behind permanently, we need to read/write to GCS objects temporarily during processing.
  • Authentication for any third-party data sources. On configuring the data source, you will be prompted to grant Matillion access to your data source and you are free to choose which account you use during that authorization. This could be:
    • Usernames/passwords for JDBC accessible databases.
    • OAuth for most others.

Delta Lake on Databricks

  • Data Loader doesn't support SSH tunneling or PrivateLink, so you must either have a publicly accessible Databricks account or set up an SSH host that's publicly accessible and can forward traffic to the VPC that Databricks is running inside. Our recommendation would be to use a publicly available Databricks account if available.
  • Databricks username and password for the instance used during testing.
  • Authentication for any third-party data sources. On configuring the data source, you will be prompted to grant Data Loader access to your data source and you're free to choose which account you use during that authorization. This could be:
    • Usernames/passwords for JDBC-accessible databases.
    • OAuth for most others.