Diagnostic data policy
Overview
This article describe's Matillion's policy and procedures concerning the handling of diagnostic data.
Diagnostic data is any data provided by a customer to the Matillion support team to assist in identifying and resolving a customer's issue while using the Matillion ETL product.
Diagnostic data policy
Matillion ETL includes some self-monitoring capabilities and may capture certain diagnostic data in the event of runtime failures. Matillion support may request that a customer enables extra diagnostic information during problem investigation, and request that a customer sends that diagnostic data for analysis. Examples include:
- Java heap memory dumps.
- Component-level log traces.
- Operating system log files.
- Matillion ETL database backups.
- Other data as required by the support process.
Warning
These diagnostics capture the state of a running system at a point in time, and so may contain fragments of personally identifiable information, personal, or sensitive data.
Non Disclosure Agreement (NDA)
- Customers will be given the option to sign Matillion's mutual, support specific NDA prior to sending any diagnostic data.
- If a customer is not able to sign Matillion's NDA, Matillion support may not be able to solve problems without diagnostic data. In some circumstances, Matillion may not be able to receive data without an NDA in place.
Please Note
Matillion's mutual NDA is available for download on the side bar for this article.
Triggering the transfer of Diagnostic Data to Matillion
- By default, diagnostic data transfer to Matillion will happen only as a result of customer action. The customer will be given the option to "push" data to Matillion.
- By default, the software will never automatically push diagnostic data to Matillion or "call home" in any way. We will never (and in fact can never) "pull" diagnostic data from an instance of the software.
- If diagnostic data pushed to Matillion is ever automated, the customer will have to explicitly choose to opt in.
Data Security
- Matillion will provide appropriate mechanisms to ensure integrity and privacy of data in transit. All communication outside of the infrastructure holding the data is via HTTPS. All data is processed automatically and once uploaded, data does not leave the Amazon cloud environment.
- Diagnostic data will be encrypted at rest while on Matillion's cloud infrastructure.
- Matillion uses Amazon S3 to store and receive diagnostic data. Server-side encryption with Amazon S3-managed encryption keys (SSE-S3) uses strong multi-factor encryption. Amazon S3 encrypts each object with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it rotates regularly. Amazon S3 server-side encryption uses 256-bit Advanced Encryption Standard (AES-256).
- No Matillion employees have access to the encryption keys, and, policy access to the Amazon S3 bucket is restricted to a limited administration team only.
- Data will be received, stored, and processed in the EU-west region.
Please Note
You may need to install the aws cli using pip install awscli
to move files to S3.
Retention
Matillion will permanently erase data held for diagnostics according to the following rules:
- Raw diagnostic data - 7 days.
- Some data (such as hprof files) will be analysed by automatic tools. Automated analysis outputs - 30 days (analysis outputs do not usually contain customer data directly).
- Sometimes manual analysis is required. Matillion will take steps to ensure that the outputs of manual analysis are deleted daily - 1 day.
- For long running support issues, Matillion will inform the customer if it is necessary to extend the retention period of data.
Purpose and Usage of Data
- Matillion will not request more diagnostic data than is needed for analysis.
- Diagnostic data will only be used for support and problem diagnosis.
Sharing
- Matillion staff are prevented from easily copying diagnostic data to their company devices.
- Matillion will always seek the customer's approval before sharing data with subcontractors if it's necessary during problem diagnosis.
Erasure and Restriction
- The customer has the right to request that Matillion stop analysis and delete all diagnostic data. This can be done by contacting Matillion support. In such instances, Matillion may not be able to diagnose or help with any problem associated with that diagnostic data.
Procedure for Sending Diagnostic Data
- Raise a case with Matillion support and get a case reference number.
- Review Matillion's diagnostic data policy.
- Optionally sign Matillion's mutual NDA.
- In the event of a problem that requires analysis, Matillion support will request that you enable or directly send the relevant diagnostic data files (see details below).
- Await instruction from Matillion support on next steps.
Enabling/Finding Heap Dumps on Your Instance
For most instances, Heap Dump generation should be enabled by default. If not, or if you are unsure, they can be enabled by finding the file:
/etc/sysconfig/tomcat
Open the file and find the line containing:
JAVA_OPTS
And add the following options towards the end (please do not change the rest of the line):
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
For example, below is a "before and after" example of what the line might look like.
Before:
JAVA_OPTS=" -Djavax.net.ssl.trustStore=/usr/lib/jvm/jre/lib/security/cacerts -Djavax.net.ssl.trustStorePassword=changeit -Djava.security.egd=file:/dev/./urandom -XX:+UseG1GC -XX:OnOutOfMemoryError=/usr/share/emerald/WEB-INF/classes/scripts/oom.sh"
After:
JAVA_OPTS=" -Djavax.net.ssl.trustStore=/usr/lib/jvm/jre/lib/security/cacerts -Djavax.net.ssl.trustStorePassword=changeit -Djava.security.egd=file:/dev/./urandom -XX:+UseG1GC -XX:OnOutOfMemoryError=/usr/share/emerald/WEB-INF/classes/scripts/oom.sh -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"
Save and close the file and then restart tomcat.
Finally, if you suspect the Heap Dump is being written but you are not sure where, use the following command to find the location:
sudo find / -name "*.hprof" -type f 2>/dev/null
Removing HPROFs from the Heap Dump
The location is determined by the value provided for attribute XXXX in file tomcat.conf.
Assuming it is set to /tmp
, use the following command to remove all HPROF files:
rm /tmp/*.hprof
Step One: Configuring Diagnostics
Your Matillion ETL instance is already configured to send data to Amazon S3.
Before moving to the next step, copy and paste the following command:
source /dev/stdin <<< "$(curl -s https://tempcredentials.matillion.com/)"
Please Note
This can also be run from a Bash Script Component in Matillion ETL.
Step Two: Sending Diagnostics To Matillion
All diagnostics that may contain sensitive data should be uploaded to Matillion using S3 via the Matillion ETL instance.
The general format for this should be as follows:
aws s3api put-object --bucket mtln-diagnostic-data --key <case_ref OR your_file_name>.hprof --body <your_file_name>.hprof --acl bucket-owner-full-control --region eu-west-1Note: Users not using the AWS CLI will need to install it on their instance using:
pip install awscli
The provided bucket (mtln-diagnostic-data) has an anonymous upload policy, it requires the host be configured with any valid AWS AccessKey. In such an instance, please submit a case at the Matillion Support Portal with details of every file that has been uploaded.
Important: Multipart upload to this bucket is not supported. The AWS command may automatically enable multipart upload depending on your configuration and size of the uploaded file. This can be prevented by using 'gzip' on the file prior to upload, which is thus recommended for all uploads.
Example: Java Heap Memory Dumps
If you do not have the AWSCLI installed, first run:
pip install awscli
You must SSH into your Matillion ETL instance. Then:
cd /tmp
Look for any .hprof
files, and one by one.
aws s3api put-object --bucket mtln-diagnostic-data --key <case_ref OR your_file_name>.hprof --body <your_file_name>.hprof --acl bucket-owner-full-control --region eu-west-1