Skip to content

Python Script

Run a Python script.

The script is executed in-process by an interpreter of the user's choice (Python2 or Python3). Any output written via print statements will appear as the task completion message, and so output should be brief.

While it is valid to handle exceptions within the script using try/except, any uncaught exceptions will cause the component to be marked as failed and its failure link to be followed.

You may import any modules from the Python Standard Library. Optionally, you may also import your own Python libraries. To do this, you need to specify the location of your libraries in the environment variable EXTENSION_LIBRARY_LOCATION. For more information on this, read Optional agent parameters. To use these additional Python libraries, you will need to include appropriate imports in your Python script, following standard Python practise.

Note

AWS users only: For Python 2, and Python 2.7, the Boto and Boto3 APIs are made available to enable interaction with the rest of AWS. It is not recommended (or necessary) to put security keys in the script.

If the component requires access to a cloud provider, by default the component will inherit the agent's execution role (service account role). However, if there are cloud credentials associated to your environment, these will overwrite the role.

Note

This component supports the use of pipeline and project variables. For more information, read Variables.

Warning

Currently, this component is not usable on Matillion Full SaaS infrastructure. See the Python Pushdown component for a way to run Python scripts in your own Snowflake environment if you have a Full SaaS solution.


Properties

Name = string

A human-readable name for the component.


Script = code editor

The Python script to execute.


Interpreter = drop-down

Select the Python interpreter to use: Python2 or Python3.


Timeout = integer

The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 300 seconds (5 minutes).


Script best practices

In a Python script, it's possible to stream all of the output of a massive curl request to the log output. However, we don't recommend this practice. Streaming all of the data to the log means that the data leaves your own infrastructure and is stored in log files on Matillion's Data Productivity Cloud platform, breaking data sovereignty.

Best practice, therefore, is to use the script to retrieve the data and then load it into an S3 bucket or some other storage of your choice, bypassing the Matillion logs.

If your Python script results in a log file of more than 300 KB being written, the script will execute successfully but the component will truncate the log file and log the message *** WARNING: log output too large. Only returning the last 300 KB **** : <contents>


Snowflake Databricks Amazon Redshift