Skip to content

Python Script

Run a Python script. The script is executed in-process by the Python 3 interpreter.

Note

The version of Python currently used by the component is 3.10.

Any output written via print statements will appear as the task completion message, and so output should be brief.

While it is valid to handle exceptions within the script using try/except, any uncaught exceptions will cause the component to be marked as failed and its failure link to be followed.

You may import any modules from the Python Standard Library. Optionally, you may also import your own Python libraries. To do this, you need to specify the location of your libraries in the environment variable EXTENSION_LIBRARY_LOCATION. For more information on this, read Optional agent parameters. To use these additional Python libraries, you will need to include appropriate imports in your Python script, following standard Python practice.

If the component requires access to a cloud provider, by default the component will inherit the agent's execution role (service account role). However, if there are cloud credentials associated to your environment, these will overwrite the role.

This component supports the use of pipeline and project variables. For more information, read Variables.

Warning

Currently, this component is not usable on Matillion Full SaaS infrastructure. See the Python Pushdown component for a way to run Python scripts in your own Snowflake environment if you have a Full SaaS solution.


Properties

Name = string

A human-readable name for the component.


Script = code editor

The Python script to execute.


Interpreter = drop-down

Set this to Python 3.


Timeout = integer

The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 300 seconds (5 minutes).


Script best practices

In a Python script, it's possible to stream all of the output of a massive curl request to the log output. However, we don't recommend this practice. Streaming all of the data to the log means that the data leaves your own infrastructure and is stored in log files on Matillion's Data Productivity Cloud platform, breaking data sovereignty.

Best practice, therefore, is to use the script to retrieve the data and then load it into an S3 bucket or some other storage of your choice, bypassing the Matillion logs.

If your Python script results in a log file of more than 300 KB being written, the script will execute successfully but the component will truncate the log file and log the message *** WARNING: log output too large. Only returning the last 300 KB **** : <contents>


Default libraries

The following libraries are available by default in the Python Script component:

a-d d-n n-s s-z
abc doctest nntplib six
agate docutils ntpath slugify
aifc dsi_pydantic_shim nturl2path smtpd
annotated_types email numbers smtplib
antigravity encodings numpy sndhdr
apiclient enum oauthlib snowplow_tracker
apt errno opcode socket
apt_inst et_xmlfile openpyxl socketserver
apt_pkg faulthandler OpenSSL softwareproperties
aptsources fcntl operator sortedcontainers
argparse filecmp optparse soupsieve
array fileinput ordered_set spwd
asn1crypto filelock os sqlite3
ast fnmatch ossaudiodev sqlparams
asynchat fractions packaging sqlparse
asyncio ftplib pandas sre_compile
asyncore functools pandas_gbq sre_constants
atexit future parsedatetime sre_parse
attr gc pathlib ssl
attrs genericpath pathspec stat
audioop getopt pdb statistics
autocommand getpass pickle string
awscli gettext pickletools stringprep
babel gi pip struct
backports glob pipes subprocess
base64 google_auth_httplib2 pkg_resources sunau
bdb google_auth_oauthlib pkgutil symtable
binascii google_crc32c platform sys
binhex googleapiclient platformdirs sysconfig
bisect graphlib plistlib syslog
blinker grp poplib tabnanny
boto3 grpc posix tarfile
botocore grpc_status posixpath telnetlib
bs4 gzip pprint tempfile
builtins hashlib profile termios
bz2 heapq proto test
cachetools hmac pstats text_unidecode
cairo html psycopg2 textwrap
calendar http pty this
certifi httplib2 pwd threading
cffi idna py_compile thrift
cgi imaplib pyarrow time
cgitb imghdr pyasn1 timeit
chardet imp pyasn1_modules token
charset_normalizer importlib pyclbr tokenize
chunk importlib_metadata pycparser tomli
click inflect pydantic tomlkit
cmath inspect pydantic_core trace
cmd io pydata_google_auth traceback
code ipaddress pydoc tracemalloc
codecs isodate pydoc_data tty
codeop itertools pyexpat turtle
collections jeepney pygtkcompat typeguard
colorama jinja2 pyparsing types
colorsys jmespath pytimeparse typing
compileall json pytz typing_extensions
concurrent jsonschema queue typing_inspection
configparser jsonschema_specifications quopri tzdata
contextlib jwt random unicodedata
contextvars keyring re unittest
copy keyword readline uritemplate
copyreg launchpadlib redshift_connector urllib
cProfile leather referencing urllib3
crypt lib2to3 reprlib uu
cryptography linecache requests uuid
csv locale requests_oauthlib venv
ctypes logging resource wadllib
curses lsb_release rlcompleter warnings
daff lxml roman wave
databricks lz4 rpds weakref
dataclasses lzma rsa webbrowser
datetime mailbox runpy wheel
dateutil mailcap s3transfer wsgiref
db_dtypes markupsafe sched xdrlib
dbm marshal scramp xml
dbt mashumaro secrets xmlrpc
dbt_common math secretstorage xxlimited
dbt_extractor mimetypes select xxlimited_35
dbt_semantic_interfaces mmap selectors xxsubtype
dbus modulefinder setuptools yaml
decimal more_itertools shelve zipapp
deepdiff msgpack shlex zipfile
difflib multiprocessing shutil zipimport
dis netrc signal zipp
distro networkx site zlib
distutils nis sitecustomize zoneinfo

Snowflake Databricks Amazon Redshift

Got feedback or spotted something we can improve?

We'd love to hear from you. Join the conversation in the Documentation forum!