Python Pushdown
Python Pushdown is an orchestration component that lets you execute a Python script using the Snowpark service in your Snowflake account.
A Python Pushdown script is executed using the Snowpark service in your Snowflake account, using the Python 3.10 interpreter. All standard libraries associated with Python 3.10 are available to use, as well as libraries provided by Snowpark Anaconda, with the exception of those that require AWS or other credentials.
Validation of the Python syntax is handled by Snowpark at runtime. If validation fails, the script won't execute and the component will terminate with a failure condition.
Any output written via print
statements will appear as the task completion message, and so output should be brief.
Pipeline and project variables can be used in the script. Note the following:
- You can set variables within the script using regular expressions, date and numeric calculations, and interpolated values.
- Date and time manipulation in variables (for example,
datetime
,timedelta
, or similar) is supported, but note thatdatetime
isn't supported as a data type in Data Productivity Cloud variables. - To learn how to manipulate variables through a Python script, read Using variables in scripts.
Your Snowflake account will need to be configured to allow it to execute Python Pushdown scripts. Configuration requirements are described in Snowflake configuration, below.
Note
- The component can't currently be used to access external databases, but can execute SQL on the Snowflake database and access Snowflake DataFrames.
- The component can't currently be used with the AWS SDK.
These restrictions are expected to be removed in a future version of the component.
Properties
Name
= string
A human-readable name for the component.
Warehouse
= drop-down (optional)
The Snowflake warehouse used to execute your chosen Python Script. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.
External Access Integrations
= dual listbox (optional)
To access external websites, the script will use your Snowflake-configured External Access Integrations. Use the dialog to select one or more valid integrations from your Snowflake account if you require this functionality.
Snowflake Secret
= dual listbox (optional)
Select any secrets associated with your selected External Access Integrations that you want to make available for use in your script. This allows your script to access external data sources that require a secret. Read Use a Snowflake secret, below.
In the Snowflake Secret dialog, select a secret from the drop-down in the Snowflake Secret column, and in the Snowflake Variable column enter the name of the variable that you want to assign the secret to. Repeat for as many secrets as you want to select, assigning each to a different variable.
These variables exist in the Python script only, and only for the duration of the script execution. They aren't associated with any project or pipeline variables.
If the component is configured to use a Snowflake secret and the secret is subsequently deleted from Snowflake, the component will fail validation until this property is corrected.
This property is only available if you have selected one or more External Access Integrations. If there are no secrets associated with the selected integrations, this property will be displayed, but the drop-down in the Snowflake Secret dialog will be empty.
Packages
= dual listbox (optional)
Use this property to specify which libraries will be imported into the Snowpark Anaconda environment when the script executes. The dialog lists every package that Anaconda supports, and selecting one is the equivalent of running pip install <package>
in the Anaconda environment.
You must select every library you have referenced with an import
statement in your script. For example, if you include import snowflake.connector
in the script, then select the snowflake-connector-python
package.
Selected packages are installed into the environment at script runtime. We strongly recommend that you only select package that your script needs, as each additional package will increase the script execution time.
The package you select will be the latest available version.
Python Version
= drop-down (optional)
Select the Python version you want to use for your script from the available options. The currently supported versions are:
- 3.9
- 3.10
- 3.11
Note
Available Python versions may change as Snowflake Snowpark adds or removes support for specific versions.
The selected Python Version
determines the available packages, and the Packages
options will automatically update when you change the Python Version
.
Python Script
= code editor
The Python script to execute.
Script Timeout
= integer
The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 360 seconds (6 minutes).
Note
Script Timeout
can't exceed the limit defined in the Snowflake internal query timeouts. For more information, read Query Timeouts in Snowflake.
Snowflake configuration
To execute Python Pushdown scripts, certain security privileges will be required within your Snowflake account. A Snowflake administrator will need to configure the following.
Grant stored procedure privileges
The user that runs the Python Pushdown component will need Snowflake privileges that allow the creation of stored procedures. For more details, read Understanding Caller's Rights and Owner's Rights Stored Procedures.
Grant external website access
To allow access to external websites, use NETWORK RULE
commands within Snowflake. These commands have the following form:
CREATE OR REPLACE NETWORK RULE gkc_w3schools_access_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('www.w3schools.com');
CREATE OR REPLACE NETWORK RULE gkc_bbc_access_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('www.bbc.co.uk');
Where VALUE_LIST
is a website you need to grant access to.
These commands create network rules to specify which websites the users can access. By default there are no sites accessible, so one of these commands must be issued for each site. For more details read Network rules.
Next, create an external access integration for the set of network rules. This can also include secrets which specify OAuth or basic authentication credentials if the sites require that. Use the following command:
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION gkc_w3schools_access_integration
ALLOWED_NETWORK_RULES = (gkc_w3schools_access_rule, gkc_bbc_access_rule)
ENABLED = TRUE;
Where ALLOWED_NETWORK_RULES
is a list containing the names of each NETWORK RULE
you specified in the first step.
Integrations link network rules to users/roles. For more information, read:
- CREATE EXTERNAL ACCESS INTEGRATION
- Creating and using an external access integration
- External API authentication and secrets
- External network access examples
Finally, grant usage on the integration to a role, and grant that role to a user. This requires the following commands:
CREATE ROLE PUSHDOWN_POC_EXTERNAL_ACCESS;
GRANT USAGE ON INTEGRATION gkc_w3schools_access_integration TO ROLE PUSHDOWN_POC_EXTERNAL_ACCESS;
GRANT ROLE PUSHDOWN_POC_EXTERNAL_ACCESS TO USER saas_etl_agent_user;
Replace gkc_w3schools_access_integration
in this example with the name of the integration you previously created.
The warehouse connection will need to specify this role when running the Python Pushdown component, so best practise would be to adopt Snowflake's recommendations on role hierarchy. Read Role hierarchy and privilege inheritance for details.
Use a Snowflake secret
To use a Snowflake secret within your script, you must create the secret within Snowflake and assign it to the external access integration. For details, read CREATE SECRET and External network access examples in the Snowflake documentation.
Accessing the Snowflake session
You can use a Python script to access the Snowflake session
object that allows you to read from and write to the Snowflake database. Read the Snowflake documentation for details.
By default, the script will access the default schema, warehouse, and database for your environment, but you can override those defaults by including override commands in the script, for example:
session.use_warehouse("NEW_WAREHOUSE")
session.use_database("NEW_DATABASE")
session.use_schema("NEW_SCHEMA")
These overrides will apply within the current script execution, and will not reset the properties for any other component in the pipeline or persist after the script completes.
The following example script shows a database query that uses the session
object to retrieve a single (aggregate) row of data, and store the result into a variable for use elsewhere in the pipeline:
queryresult = session.sql('select "Texter" from "19820_all_data_types" where "Integer"=123456').collect()
print("queryresult[0][0] is: ",queryresult[0][0])
context.updateVariable('scalarVar',queryresult[0][0])
Snowflake | Databricks | Amazon Redshift |
---|---|---|
✅ | ❌ | ❌ |