Run a Bash script, redirecting any output it produces into the task message.
The script is executed in an external Bash process within your instance. Any errors encountered while running the script will immediately halt it.
To ensure that instance credentials access is managed correctly at all times, we always advise that customers limit scopes (permissions) where applicable.
Since the instance is based on the latest Linux, the command line tools are all installed. Furthermore, the credentials stored in your current environment are exported into the shell, so you may (and indeed should) omit security keys from your scripts when calling the APIs.
All the usual variables are made available in the bash environment and any changes made to such variables will never be visible outside of the current script execution.
Designer runs as a Tomcat user and care must be taken to ensure this user has sufficient access to resources and doesn't uninstall any customer-installed Bash libraries.
If you cancel a task while a Bash script is running, then it is killed. If the timeout is exceeded, the script is also killed. The purpose of the timeout is to ensure scripts will never run forever even if they enter an infinite loop or are blocked by an external resource.
Currently, this component is not usable with a Matillion managed project.
Name = string
A human-readable name for the component.
Script = code editor
The Bash script to execute. Output from commands should be brief, as it is sent into the Task Status message.
Timeout = integer
The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 300 seconds (5 minutes).
Script best practices
In a Bash script, it's possible to stream all of the output of a massive
curl request to the log output. However, we don't recommend this practice. Streaming all of the data to the log means that the data leaves your own infrastructure and is stored in log files on Matillion's Data Productivity Cloud platform, breaking data sovereignty.
Best practice, therefore, is to use the script to retrieve the data and then load it into an S3 bucket or some other storage of your choice, bypassing the Matillion logs.
A Bash script of the following form would do this:
aws s3 ls s3://bash-test curl https://my.data.source/datafile | aws s3 cp - s3://bash-test/my-data.txt aws s3 ls s3://bash-test
If your Bash script results in a log file of more than 300KB being written, the script will execute successfully but the component will truncate the log file and log the message:
*** WARNING: log output too large. Only returning the last 300KB **** : <contents>