Skip to content

Python Pushdown

Public preview

Execute a Python script using the Snowpark service in your Snowflake account.

A Python Pushdown script is executed using the Snowpark service in your Snowflake account, using the Python 3.10 interpreter. All standard libraries associated with Python 3.10 are available to use, as well as libraries provided by Snowpark Anaconda, with the exception of those that require AWS or other credentials.

Validation of the Python syntax is handled by Snowpark at runtime. If validation fails, the script won't execute and the component will terminate with a failure condition.

Any output written via print statements will appear as the task completion message, and so output should be brief.

Pipeline and project variables can be used in the script. Note the following:

  • You can set variables within the script using regular expressions, date and numeric calculations, and interpolated values.
  • Date and time manipulation in variables (for example, datetime, timedelta, or similar) is supported, but note that datetime isn't supported as a data type in Data Productivity Cloud variables.
  • To learn how to manipulate variables through a Python script, read Using variables in scripts.

Your Snowflake account will need to be configured to allow it to execute Python Pushdown scripts. Configuration requirements are described in Snowflake configuration, below.

Note

  • The component can't currently be used to access external databases, but can execute SQL on the Snowflake database and access Snowflake DataFrames.
  • The component can't currently be used with the AWS SDK.

These restrictions are expected to be removed in a future version of the component.


Properties

Name = string

A human-readable name for the component.


External Access Integrations = dual listbox

To access external websites, the script will use your Snowflake-configured External Access Integrations. Use the dialog to select one or more valid integrations from your Snowflake account if you require this functionality.


Packages = dual listbox

Use this property to specify which libraries will be imported into the Snowpark Anaconda environment when the script executes. The dialog lists every package that Anaconda supports, and selecting one is the equivalent of running pip install <package> in the Anaconda environment.

You must select every library you have referenced with an import statement in your script. For example, if you include import snowflake.connector in the script, then select the snowflake-connector-python package.

Selected packages are installed into the environment at script runtime. We strongly recommend that you only select package that your script needs, as each additional package will increase the script execution time.


Python Script = code editor

The Python script to execute.


Script Timeout = integer

The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 360 seconds (6 minutes).


Snowflake configuration

To execute Python Pushdown scripts, certain security privileges will be required within your Snowflake account. A Snowflake administrator will need to configure the following.

Grant user-defined function privileges

The user that runs the Python Pushdown component will need Snowflake privileges that allow the creation and dropping of user-defined-functions. For more details, read Granting Privileges for User-Defined Functions.

Grant external website access

To allow access to external websites, use NETWORK RULE commands within Snowflake. These commands have the following form:

CREATE OR REPLACE NETWORK RULE gkc_w3schools_access_rule
    MODE = EGRESS
    TYPE = HOST_PORT
    VALUE_LIST = ('www.w3schools.com');

CREATE OR REPLACE NETWORK RULE gkc_bbc_access_rule
    MODE = EGRESS
    TYPE = HOST_PORT
    VALUE_LIST = ('www.bbc.co.uk');

Where VALUE_LIST is a website you need to grant access to.

These commands create network rules to specify which websites the users can access. By default there are no sites accessible, so one of these commands must be issued for each site. For more details read Network rules.

Next, create an external access integration for the set of network rules. This can also include secrets which specify OAuth or basic authentication credentials if the sites require that. Use the following command:

CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION gkc_w3schools_access_integration
    ALLOWED_NETWORK_RULES = (gkc_w3schools_access_rule, gkc_bbc_access_rule)
    ENABLED = TRUE;   

Where ALLOWED_NETWORK_RULES is a list containing the names of each NETWORK RULE you specified in the first step.

Integrations link network rules to users/roles. For more information, read:

Finally, grant usage on the integration to a role, and grant that role to a user. This requires the following commands:

CREATE ROLE PUSHDOWN_POC_EXTERNAL_ACCESS;
GRANT USAGE ON INTEGRATION gkc_w3schools_access_integration TO ROLE PUSHDOWN_POC_EXTERNAL_ACCESS; 
GRANT ROLE PUSHDOWN_POC_EXTERNAL_ACCESS TO USER saas_etl_agent_user;

Replace gkc_w3schools_access_integration in this example with the name of the integration you previously created.

The warehouse connection will need to specify this role when running the Python Pushdown component, so best practise would be to adopt Snowflake's recommendations on role hierarchy. Read Role hierarchy and privilege inheritance for details.


Accessing the Snowflake session

You can use a Python script to access the Snowflake session object that allows you to read from and write to the Snowflake database. Read the Snowflake documentation for details.

By default, the script will access the default schema, warehouse, and database for your environment, but you can override those defaults by including override commands in the script, for example:

session.use_warehouse("NEW_WAREHOUSE")
session.use_database("NEW_DATABASE")
session.use_schema("NEW_SCHEMA")

These overrides will apply within the current script execution, and will not reset the properties for any other component in the pipeline or persist after the script completes.

The following example script shows a database query that uses the session object to retrieve a single (aggregate) row of data, and store the result into a variable for use elsewhere in the pipeline:

queryresult = session.sql('select "Texter" from "19820_all_data_types" where "Integer"=123456').collect()
print("queryresult[0][0] is: ",queryresult[0][0])
context.updateVariable('scalarVar',queryresult[0][0])

Snowflake Databricks Amazon Redshift (preview)