Matillion ETL security best practices
This page describes methods for enhancing the security of your Matillion ETL instances. Specifically, this page focuses on three security principles: confidentiality, integrity, and availability.
If you encounter any problems with the methods described on this page, please contact our support team.
Confidentiality
Access to data should only be possible after authentication, and should be subject to some level of authorization.
Matillion recommendations
- Protect your running Matillion ETL instances with a firewall that allows the least privileges necessary. Matillion has one web service that checks whether your instance is available to the world. This situation usually indicates an overly permissive firewall, and you'll see a warning: "Your copy of Matillion ETL is publicly available"– in the notices window.
- We recommend that you configure Matillion ETL with the minimum environment connection permissions possible. Please be aware that cloud platform (AWS, Azure, and GCP) metadata endpoints are always accessible, and may allow users to retrieve privileged information—including SSO keys.
- Use HTTPS rather than HTTP. Matillion ships with a self-signed SSL certificate that is perfectly functional, but that causes your browser to issue an "un-trusted" warning. You can upload your own certificate and associated private key if you wish.
- Enable SSL for JDBC communications between Matillion and all the data sources and targets. This is sometimes a JDBC connection option, which you can set using. component parameters. It's sometimes a property of the source database, which needs to be configured by the database administrator. For Snowflake and Amazon Redshift, there's an "Enable SSL" option in the Matillion ETL environment.
- When using components that output to permanent cloud storage, choose the option to enable encryption at rest.
- Take advantage of the authentication and authorization options of the target database, especially using a strong username/password combination.
- Don't set up an environment using a powerful "administrator" user. Instead, use a minimally privileged ordinary database user.
- Use the Matillion password manager to store passwords. Choose the option to use KMS encryption rather than the default encoding (which is obfuscation).
- Keep password access to Matillion ETL enabled. You can use a local database, or can switch to an existing LDAP server.
- Use project ACLs to enable or disable access to Matillion users.
- Matillion ETL has an authorization model: don't use generic names, and grant minimum privileges to every individual user.
- Do not use 'None' user authentication. All access to Matillion ETL instances should be protected by requiring a user-specific username and password, whether managed internally by OpenID or LDAP. Stateless authentication can be used to manage these settings on HA clusters.
- Use "top-level" data acquisition components where possible, i.e. those that have their own dedicated orchestration component. These almost all link into Matillion's OAuth credentials management system, which allows you to secure connectivity using OAuth.
Integrity
Data should be protected from incorrect modification.
Matillion recommendations
- Don't change data during load. This will ensure that the data transformation jobs are working from an accurate copy of the source data.
- Use Matillion ETL's documentation feature to document and publish the ELT designs.
- Implement a testing process that involves testing all orchestration and transformation jobs on representative (and ideally full-volume) source data.
- Have deployment procedures and a version control methodology.
- Use the Matillion ETL audit trail feature to monitor changes to the jobs.
- Take advantage of database transactions. This can help ensure that multipart data transformations are either completely successful, or else fail completely without leaving data partially updated.
Availability
Ensure that information can be accessed when appropriate and required.
Matillion recommendations
- Control access to the transformed data by using a different database user than the one used by Matillion ETL. This will help ensure that reports or analytics don't get accidentally run against the wrong dataset, or against data that hasn't yet been fully transformed.
- Use one of Matillion ETL's three backup methods: export/import, API export, and root volume backups (AWS only).
- Have disaster recovery procedures and test them.
- Keep software up to date - monitor the Notices panel for "updates available" messages.
Network protection and firewalling
Minimizing the number of machines that can attempt to access valuable infrastructure reduces the potential attack surface.
Matillion recommendations
- If using an external persistence database, which we recommended in our "integrity" section above, then the persistence store should only accept connections from the expected hosts. This would be the Matillion ETL instances, and machines used for database administration accounts. You can enable these settings using firewall rules or a VPC account.
- If using a load balancer to distribute connections between nodes in a HA deployment, then the nodes should only accept incoming HTTPS connections from the load balancer. SSH connections to individual nodes will probably be required for administration, but the nodes will not need to accept any other incoming network traffic.