Postgres Vector Upsert🔗

Editions

Production use of this feature is available for specific editions only. Contact our sales team for more information.

Postgres Vector Upsert is an orchestration component lets you convert text data stored in your cloud data warehouse into embeddings and then store these embeddings as vectors in your Postgres vector database.

Video example🔗

Expand this box to watch our video about using the Postgres Vector Upsert component.

Video

Properties🔗

Name = string

A human-readable name for the component.

Select your cloud data warehouse.

SnowflakeDatabricksAmazon Redshift

Database = drop-down

The Snowflake database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.

Schema = drop-down

The Snowflake schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.

Table = string

The Snowflake table that holds your source data.

Catalog = drop-down

Select a Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.

Schema (Database) = drop-down

The Databricks schema. The special value [Environment Default] uses the schema defined in the environment. Read Create and manage schemas to learn more.

Table = drop-down

The Databricks table that holds your source data.

Schema = drop-down

The Amazon Redshift schema. The special value [Environment Default] uses the schema defined in the environment. Read Schemas to learn more.

For more information on using multiple schemas, read Schemas.

Table = drop-down

An existing Redshift table to use as the input.

Key Column = drop-down

Set a column as the primary key.

Text Column = drop-down

The column of data to convert into embeddings to then be upserted into your Postgres vector database.

Limit = integer

Set a limit for the numbers of rows from the table to load. The default is 1000.

Embedding Provider = drop-down

The embedding provider is the API service used to convert the search term into a vector. Choose either OpenAI or Amazon Bedrock. The embedding provider receives a search term (e.g. "How do I log in?") and returns a vector.

Choose your provider:

OpenAIAmazon Bedrock

OpenAI API Key = drop-down

Use the drop-down menu to select the corresponding secret definition that denotes the value of your OpenAI API key.

Read Secrets and secret definitions to learn how to create a new secret definition.

To create a new OpenAI API key:

Log in to OpenAI.
Click your avatar in the top-right of the UI.
Click View API keys.
Click + Create new secret key.
Give a name for your new secret key and click Create secret key.
Copy your new secret key and save it. Then click Done.

Embedding Model = drop-down

Select an embedding model.

Currently supports:

Model	Dimension
text-embedding-ada-002	1536
text-embedding-3-small	1536
text-embedding-3-large	3072

API Batch Size = integer

Set the size of array of data per API call. The default size is 10. When set to 10, 1000 rows would therefore require 100 API calls.

You may wish to reduce this number if a row contains a high volume of data; and conversely, increase this number for rows with low data volume.

Region = drop-down

Select your AWS region.

Embedding Model = drop-down

Select an embedding model.

Currently supports:

Model	Dimension
Titan Embeddings G1 - Text	1536

Host = string

Your Postgres hostname.

Port = string

The TCP port number the Postgres server listens on. The default is 5432.

Database = string

The name of your Postgres database.

Username = string

Your Postgres username.

Password = drop-down

Use the drop-down menu to select the corresponding secret definition that denotes the value of your Postgres password.

Read Secrets and secret definitions to learn how to create a new secret definition.

Schema = drop-down

The Postgres schema. The available schemas are determined by the Postgres database you have provided.

Table = drop-down

The table to load data from. The available tables are determined by the Postgres schema you have selected.

Key Column Name = drop-down

The column in your table to use as the key column.

Text Column Name = drop-down

The column in your table with your original text data.

Embedding Column Name = drop-down

The column in your table used to store your embeddings.

Connection Options = column editor

Parameter: A JDBC Postgres parameter supported by the database driver.
Value: A value for the given parameter.

Snowflake	Databricks	Amazon Redshift
✅	✅	✅
<<<<<<< HEAD
=======

Got feedback or spotted something we can improve?

We'd love to hear from you. Join the conversation in the Documentation forum!

1b69923df1f01ca10cbe349aaa03e0343096011b