Python Pushdown
Python Pushdown is an orchestration component that lets you execute a Python script using the Snowpark service in your Snowflake account.
A Python Pushdown script is executed using the Snowpark service in your Snowflake account, using the Python version selected in the Python Version parameter. All standard libraries associated with that Python version will be available to use, as well as libraries provided by Snowpark Anaconda, with the exception of those that require AWS or other credentials.
Validation of the Python syntax is handled by Snowpark at runtime. If validation fails, the script won't execute and the component will terminate with a failure condition.
Any output written via print
statements will appear as the task completion message, and so output should be brief.
Pipeline and project variables can be used in the script. Note the following:
- You can set variables within the script using regular expressions, date and numeric calculations, and interpolated values.
- Date and time manipulation in variables (for example,
datetime
,timedelta
, or similar) is supported, but note thatdatetime
isn't supported as a data type in Data Productivity Cloud variables. - To learn how to manipulate variables through a Python script, read Using variables in scripts.
Your Snowflake account will need to be configured to allow it to execute Python Pushdown scripts. Configuration requirements are described in Snowflake configuration, below.
Note
The component can't currently be used to access external databases, but can execute SQL on the Snowflake database and access Snowflake DataFrames. This restriction is expected to be removed in a future version of the component.
Properties
Warehouse
= drop-down (optional)
The Snowflake warehouse used to execute your chosen Python Script. The special value [Environment Default]
uses the warehouse defined in the environment. Read Overview of Warehouses to learn more.
External Access Integrations
= dual listbox (optional)
To access external websites, the script will use your Snowflake-configured External Access Integrations. Use the dialog to select one or more valid integrations from your Snowflake account if you require this functionality.
Snowflake Secret
= dual listbox (optional)
Select any secrets associated with your selected External Access Integrations that you want to make available for use in your script. This allows your script to access external data sources that require a secret. Read Create and retrieve Snowflake secrets using Python Pushdown, below.
In the Snowflake secret dialog, select a secret from the drop-down in the Snowflake secret column, and in the Snowflake variable column enter the name of the variable that you want to assign the secret to. Repeat for as many secrets as you want to select, assigning each to a different variable.
These variables exist in the Python script only, and only for the duration of the script execution. They aren't associated with any project or pipeline variables.
If the component is configured to use a Snowflake secret and the secret is subsequently deleted from Snowflake, the component will fail validation until this property is corrected.
This property is only available if you have selected one or more External Access Integrations. If there are no secrets associated with the selected integrations, this property will be displayed, but the drop-down in the Snowflake secret dialog will be empty.
Python Version
= drop-down (optional)
Select the Python version you want to use for your script. The default setting is currently 3.10
.
The Python version you select will determine which packages and their supported versions are available in the Packages parameter, and the options will automatically update when you switch the Python version.
Note
Available Python versions may change as Snowflake Snowpark adds or removes support for specific versions.
Packages
= grid editor (optional)
Use this property to specify which libraries will be imported into the Snowpark Anaconda environment when the script executes.
- Package Name: Select a package. The packages available depend on the selected Python version in the Python Version parameter.
- Package Version: Choose the version of the package to import.
The dialog lists every package that Anaconda supports, and selecting one is the equivalent of running pip install <package>
in the Anaconda environment.
You must select every library you have referenced with an import
statement in your script. For example, if you include import snowflake.connector
in the script, then select the snowflake-connector-python
package.
Selected packages are installed into the environment at script runtime. We strongly recommend that you only select packages that your script needs, as each additional package will increase the script execution time.
Script Location
= drop-down
- Component: Use the Python Script property to add your Python script to this component. This is the default setting.
- File: Use the Select File property to choose a
.py
file in your project. The drop-down will list all.py
files from the repository that is connected to your project.
Select File
= drop-down
Choose a .py
file in your project to run via this component. The drop-down will list all .py
files from the repository that is connected to your project.
Double-click a .py
file in the Files panel to open that .py
file in the code editor.
Python Script
= code editor
Add your Python script into the code editor. This script is stored in the component.
Script Timeout
= integer
The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 360 seconds (6 minutes).
Note
Script Timeout
can't exceed the limit defined in the Snowflake internal query timeouts. For more information, read Query Timeouts in Snowflake.
Snowflake configuration
To execute Python Pushdown scripts, certain security privileges will be required within your Snowflake account. A Snowflake administrator will need to configure the following.
Grant stored procedure privileges
The user that runs the Python Pushdown component will need Snowflake privileges that allow the creation of stored procedures. For more details, read Understanding Caller's Rights and Owner's Rights Stored Procedures.
Grant external website access
To allow access to external websites, use NETWORK RULE
commands within Snowflake. These commands have the following form:
CREATE OR REPLACE NETWORK RULE gkc_w3schools_access_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('www.w3schools.com');
CREATE OR REPLACE NETWORK RULE gkc_bbc_access_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('www.bbc.co.uk');
Where VALUE_LIST
is a website you need to grant access to.
These commands create network rules to specify which websites the users can access. By default there are no sites accessible, so one of these commands must be issued for each site. For more details read Network rules.
Next, create an external access integration for the set of network rules. This can also include secrets which specify OAuth or basic authentication credentials if the sites require that. Use the following command:
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION gkc_w3schools_access_integration
ALLOWED_NETWORK_RULES = (gkc_w3schools_access_rule, gkc_bbc_access_rule)
ENABLED = TRUE;
Where ALLOWED_NETWORK_RULES
is a list containing the names of each NETWORK RULE
you specified in the first step.
Integrations link network rules to users/roles. For more information, read:
- CREATE EXTERNAL ACCESS INTEGRATION
- Creating and using an external access integration
- External API authentication and secrets
- External network access examples
Finally, grant usage on the integration to a role, and grant that role to a user. This requires the following commands:
CREATE ROLE PUSHDOWN_POC_EXTERNAL_ACCESS;
GRANT USAGE ON INTEGRATION gkc_w3schools_access_integration TO ROLE PUSHDOWN_POC_EXTERNAL_ACCESS;
GRANT ROLE PUSHDOWN_POC_EXTERNAL_ACCESS TO USER saas_etl_agent_user;
Replace gkc_w3schools_access_integration
in this example with the name of the integration you previously created.
The warehouse connection will need to specify this role when running the Python Pushdown component, so best practise would be to adopt Snowflake's recommendations on role hierarchy. Read Role hierarchy and privilege inheritance for details.
Create and retrieve Snowflake secrets using Python Pushdown
You can use the Python Pushdown component to access secrets you create in Snowflake. This allows you to securely store API credentials as Snowflake SECRET
objects, and use the Python Pushdown component to access these secrets programmatically inside your Data Productivity Cloud pipelines. This allows you to securely manage sensitive information such as API keys or database credentials.
The process for this is:
- Set up Snowflake to hold secrets.
- Create secrets in Snowflake.
- Set up the Python Pushdown component.
- Create a Python script to use the secrets.
These steps are described below. The first two steps will typically have to be performed by your Snowflake administrator.
Set up Snowflake
Before creating secrets in Snowflake for use in the Data Productivity Cloud, ensure the following Snowflake configuration is in place.
-
Create a network rule that allows outbound network traffic from Snowflake to the target API's hostname. The following SQL will create this rule:
CREATE OR REPLACE NETWORK RULE <my_api_network_rule> MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('<api_hostname>');
Where:
<my_api_network_rule>
is the name of the network rule you want to create.<api_hostname>
is the hostname of the API you want to access. For example, to accessapi.example.com
, useVALUE_LIST = ('api.example.com')
.
Note
In this and other example commands shown in the section, the symbols
< >
indicate a placeholder that you should replace with real values. You should not include the< >
symbols in the commands. -
Create an external access integration that allows Snowpark (and therefore the Python Pushdown component) to use the defined network rule. The following SQL will create this integration:
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION <my_api_access_integration> ALLOWED_NETWORK_RULES = (<my_api_network_rule>) ENABLED = TRUE;
Where:
<my_api_access_integration>
is the name of the access integration you want to create.<my_api_network_rule>
is the name of the network rule you created previously.
You will need to specify this access integration when you set up the Python Pushdown component in the Data Productivity Cloud.
-
Grant privileges to the execution role.
The Snowflake role that the Data Productivity Cloud will use to execute the Python Pushdown script requires
USAGE
permission on the external access integration. The following SQL will grant this permission:GRANT USAGE ON INTEGRATION <my_api_access_integration> TO ROLE <your_matillion_snowflake_role>;
Where:
<my_api_access_integration>
is the name of the access integration you created previously<your_matillion_snowflake_role>
is the Snowflake role the Data Productivity Cloud uses.
Granting USAGE
on secrets will be handled during secret creation.
Create secrets in Snowflake
You can create secrets in Snowflake using the CREATE SECRET
SQL command. The following example SQL shows how to create a secret and grant usage to the Data Productivity Cloud:
CREATE OR REPLACE SECRET <secret_name>
TYPE = GENERIC_STRING
SECRET_STRING = '<secret_value>';
GRANT USAGE ON SECRET <secret_name> TO ROLE <your_matillion_snowflake_role>;
Where:
<secret_name>
is the name of the secret you want to create.<secret_value>
is the value of the secret, such as an API key or password.<your_matillion_snowflake_role>
is the Snowflake role the Data Productivity Cloud uses.
For more details on how to create and manage secrets, refer to Snowflake's documentation on Creating and managing secrets and API reference for access to secrets.
Set up the Python Pushdown component
Add a Python Pushdown component to your Data Productivity Cloud pipeline, and configure its properties as follows:
- External Access Integrations: Add the name of the external access integration previously created in Snowflake. For example,
<my_api_access_integration>
. This enables network calls from the Python script. - Snowflake Secret: Select the Snowflake secrets that you previously created in Snowflake, and assign them to Python variables. For example, you might map the secret
aws_access_key
to a variablesnow_aws_access_key
. Once you've mapped the Snowflake secrets to variables, you can access them in your Python script. - Packages: You need the
boto3
package for the specific example below. Your script may address a different integration, requiring different Python package dependencies.
Create a Python script to use the secrets
With the Snowflake secrets mapped to variables, you can access them in your Python Pushdown script. Here's an example of how to reference Snowflake secrets inside a Python script. This script is intended as an example only, and you should adapt it to your specific use case.
import boto3
import _snowflake
# Referencing Snowflake secrets stored as variables
aws_id = _snowflake.get_generic_secret_string('snow_aws_access_key')
aws_secret = _snowflake.get_generic_secret_string('snow_aws_access_secret')
secret_name = pipe_secret_name
secret_region = pipe_secret_region
# Creating an AWS session using the referenced secrets
session = boto3.Session(
aws_access_key_id=aws_id,
aws_secret_access_key=aws_secret
)
# Fetching secret value from Secrets Manager
client = session.client(service_name='secretsmanager', region_name=secret_region)
secret_response = client.get_secret_value(SecretId=secret_name)
# Storing the secret in a Snowflake variable
context.updateVariable('pipe_svc_acct', secret_response["SecretString"])
print(pipe_svc_acct)
In this example:
- The secrets are securely retrieved using the
get_generic_secret_string()
method from the_snowflake
module, allowing for secure access to external systems like AWS. aws_id
andaws_secret
are populated by referencing the Snowflake secretssnow_aws_access_key
andsnow_aws_access_secret
, and are variables that are private to this script, so their populated values are never visible outside the script.
Note
If you encounter errors such as NameError: name 'snow_aws_access_key' is not defined
, double-check that the variable names mapped in the Snowflake secret property match exactly with the ones used in the Python script.
Accessing the Snowflake session
You can use a Python script to access the Snowflake session
object that allows you to read from and write to the Snowflake database. Read the Snowflake documentation for details.
By default, the script will access the default schema, warehouse, and database for your environment, but you can override those defaults by including override commands in the script, for example:
session.use_warehouse("NEW_WAREHOUSE")
session.use_database("NEW_DATABASE")
session.use_schema("NEW_SCHEMA")
These overrides will apply within the current script execution, and will not reset the properties for any other component in the pipeline or persist after the script completes.
The following example script shows a database query that uses the session
object to retrieve a single (aggregate) row of data, and store the result into a variable for use elsewhere in the pipeline:
queryresult = session.sql('select "Texter" from "19820_all_data_types" where "Integer"=123456').collect()
print("queryresult[0][0] is: ",queryresult[0][0])
context.updateVariable('scalarVar',queryresult[0][0])
The main() function
Scripts in Python Pushdown don't make use of a default handler to call the main()
function, as is the case in Snowflake Python Worksheets. This is a key difference between running scripts in Python Pushdown and in Snowflake Python Worksheets. In Python Pushdown, if you wish to use a main()
function you must explicitly call it in your script to execute it:
main()
Snowflake | Databricks | Amazon Redshift |
---|---|---|
✅ | ❌ | ❌ |