Skip to content

Snowflake Vector Upsert

Public preview

The Snowflake Vector Upsert component lets you convert data stored in a Snowflake cloud data warehouse into vector embeddings, and then store these embeddings in a new Snowflake table. This will allow you to use alternative embedding models (for example, OpenAI or Amazon Bedrock) instead of Snowflake's Cortex embedding.

The destination table must already exist; this component won't create it. The destination table must have a column to hold a copy of the source text and a column to hold the vector embeddings.

Note

SQL queries will need to be used rather than the create table component to create a destination table. Take care to make sure the vector dimensions match the vector dimensions of the model you intend to use.

Example SQL query to create a table with a vector column:

CREATE TABLE "destination-table" ("id" NUMBER, "text" TEXT, "embedding_result" VECTOR(float, 768));

The vector dimension is set to a fixed value for each embedding model. To find the value, see the Model property, below.


Properties

Name = string

A human-readable name for the component.


Database = drop-down

The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


Table = string

The Snowflake table that holds your source data.


Key Column = drop-down

Set a column as the primary key.


Text Column = drop-down

The column of data to convert into embeddings to then be upserted into your Snowflake vector database.


Limit = integer

Set a limit for the maximum number of rows to load from the table. The default is 1000.


Embedding Provider = drop-down

The embedding provider is the API service used to convert the search term into a vector. Choose either OpenAI or Amazon Bedrock. The embedding provider receives a search term (e.g. "How do I log in?") and returns a vector.

Choose your provider:

API Key = drop-down

Use the drop-down menu to select the corresponding secret definition that denotes the value of your OpenAI API key.

Read Secret definitions to learn how to create a new secret definition.

To create a new OpenAI API key:

  1. Log in to OpenAI.
  2. Click your avatar in the top-right of the UI.
  3. Click View API keys.
  4. Click + Create new secret key.
  5. Give a name for your new secret key and click Create secret key.
  6. Copy your new secret key and save it. Then click Done.

Model = drop-down

Select an embedding model.

Currently supports:

Model Dimension
text-embedding-ada-002 1536
text-embedding-3-small 1536
text-embedding-3-large 3072

API Batch Size = integer

Set the size of array of data per API call. The default size is 10. When set to 10, 1000 rows would therefore require 100 API calls.

You may wish to reduce this number if a row contains a high volume of data; and conversely, increase this number for rows with low data volume.

Region = drop-down

Select your AWS region.


Model = drop-down

Select an embedding model.

Currently supports:

Model Dimension
Titan Embeddings G1 - Text 1536

Database = drop-down

The destination Snowflake database. This can be the same as the source database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The destination Snowflake schema. This can be the same as the source schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


Table = drop-down

Select the destination table.


Key Column = drop-down

The column in the destination table to use as the key column.


Text Column = drop-down

The column in the destination table that will hold the copied source data.


Embedding Column = drop-down

The column in the destination table that will hold the vector embeddings.


Snowflake Databricks Amazon Redshift