Skip to content

Python Script

Run a Python script. The script is executed in-process by the Python 3 interpreter.

Note

The version of Python currently used by the component is 3.10.

Any output written via print statements will appear as the task completion message, and so output should be brief.

While it is valid to handle exceptions within the script using try/except, any uncaught exceptions will cause the component to be marked as failed and its failure link to be followed.

You may import any modules from the Python Standard Library. Optionally, you may also import your own Python libraries. To do this, you need to specify the location of your libraries in the environment variable EXTENSION_LIBRARY_LOCATION. For more information on this, read Optional agent parameters. To use these additional Python libraries, you will need to include appropriate imports in your Python script, following standard Python practice.

If the component requires access to a cloud provider, by default the component will inherit the agent's execution role (service account role). However, if there are cloud credentials associated to your environment, these will overwrite the role.

This component supports the use of pipeline and project variables. For more information, read Variables.

Warning

Currently, this component is not usable on Matillion Full SaaS infrastructure. See the Python Pushdown component for a way to run Python scripts in your own Snowflake environment if you have a Full SaaS solution.


Properties

Name = string

A human-readable name for the component.


Script = code editor

The Python script to execute.


Interpreter = drop-down

Set this to Python 3.


Timeout = integer

The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 300 seconds (5 minutes).


Script best practices

In a Python script, it's possible to stream all of the output of a massive curl request to the log output. However, we don't recommend this practice. Streaming all of the data to the log means that the data leaves your own infrastructure and is stored in log files on Matillion's Data Productivity Cloud platform, breaking data sovereignty.

Best practice, therefore, is to use the script to retrieve the data and then load it into an S3 bucket or some other storage of your choice, bypassing the Matillion logs.

If your Python script results in a log file of more than 300 KB being written, the script will execute successfully but the component will truncate the log file and log the message *** WARNING: log output too large. Only returning the last 300 KB **** : <contents>


Default libraries

The following libraries are available by default in the Python Script component:

a-d d-n n-s s-z
abc distutils nntplib slugify
agate doctest ntpath smtpd
aifc docutils nturl2path smtplib
annotated_types dsi_pydantic_shim numbers sndhdr
antigravity email numpy snowplow_tracker
apiclient encodings oauthlib socket
apt enum opcode socketserver
apt_inst errno openpyxl softwareproperties
apt_pkg et_xmlfile OpenSSL sortedcontainers
aptsources faulthandler operator soupsieve
argparse fcntl optparse spwd
array filecmp ordered_set sqlite3
asn1crypto fileinput os sqlparams
ast filelock ossaudiodev sqlparse
asynchat fnmatch packaging sre_compile
asyncio fractions pandas sre_constants
asyncore ftplib pandas_gbq sre_parse
atexit functools parsedatetime ssl
attr future pathlib stat
attrs gc pathspec statistics
audioop genericpath pdb string
autocommand getopt pickle stringprep
awscli getpass pickletools struct
azure_identity gettext pip subprocess
azure_keyvault_secrets gi pipes sunau
babel glob pkg_resources symtable
backports google_auth_httplib2 pkgutil sys
base64 google_auth_oauthlib platform sysconfig
bdb google_crc32c platformdirs syslog
binascii googleapiclient plistlib tabnanny
binhex graphlib poplib tarfile
bisect grp posix telnetlib
blinker grpc posixpath tempfile
boto3 grpc_status pprint termios
botocore gzip profile test
bs4 hashlib proto text_unidecode
builtins heapq pstats textwrap
bz2 hmac psycopg2 this
cachetools html pty threading
cairo http pwd thrift
calendar httplib2 py_compile time
certifi idna pyarrow timeit
cffi imaplib pyasn1 token
cgi imghdr pyasn1_modules tokenize
cgitb imp pyclbr tomli
chardet importlib pycparser tomlkit
charset_normalizer importlib_metadata pydantic trace
chunk inflect pydantic_core traceback
click inspect pydata_google_auth tracemalloc
cmath io pydoc tty
cmd ipaddress pydoc_data turtle
code isodate pyexpat typeguard
codecs itertools pygtkcompat types
codeop jeepney pyparsing typing
collections jinja2 pytimeparse typing_extensions
colorama jmespath pytz typing_inspection
colorsys json queue tzdata
compileall jsonschema quopri unicodedata
concurrent jsonschema_specifications random unittest
configparser jwt re uritemplate
contextlib keyring readline urllib
contextvars keyword redshift_connector urllib3
copy launchpadlib referencing uu
copyreg leather reprlib uuid
cProfile lib2to3 requests venv
crypt linecache requests_oauthlib wadllib
cryptography locale resource warnings
csv logging rlcompleter wave
ctypes lsb_release roman weakref
curses lxml rpds webbrowser
daff lz4 rsa wheel
databricks lzma runpy wsgiref
dataclasses mailbox s3transfer xdrlib
datetime mailcap sched xml
dateutil markupsafe scramp xmlrpc
db_dtypes marshal secrets xxlimited
dbm mashumaro secretstorage xxlimited_35
dbt math select xxsubtype
dbt_common mimetypes selectors yaml
dbt_extractor mmap setuptools zipapp
dbt_semantic_interfaces modulefinder shelve zipfile
dbus more_itertools shlex zipimport
decimal msgpack shutil zipp
deepdiff multiprocessing signal zlib
difflib netrc site zoneinfo
dis networkx sitecustomize
distro nis six

Snowflake Databricks Amazon Redshift

Got feedback or spotted something we can improve?

We'd love to hear from you. Join the conversation in the Documentation forum!