Skip to content

Pinecone Vector Upsert

The Pinecone Vector Upsert component lets you convert data stored in your cloud data warehouse into embeddings and then store these embeddings as vectors in your Pinecone vector database.


Data freshness

According to Pinecone's documentation:

Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries.

Keep this in mind for instances when running query operations shortly after upsert operations.


Properties

Name = string

A human-readable name for the component.


Database = drop-down

The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


Table = string

The Snowflake table that holds your source data.


Key Column = drop-down

Set a column as the primary key.


Text Column = drop-down

The column of data to convert into embeddings to then be upserted into your Pinecone vector database.


Limit = integer

Set a limit for the numbers of rows from the table to load. The default is 1000.


Embedding Provider = drop-down

The embedding provider is the API service used to convert the search term into a vector. Currently supports OpenAI. The embedding provider receives a search term (e.g. "How do I log in?") and returns a vector.


OpenAI API Key = drop-down

Use the drop-down menu to select the corresponding secret definition that denotes the value of your OpenAI API key.

Read Secret definitions to learn how to create a new secret definition.

To create a new OpenAI API key:

  1. Log in to OpenAI.
  2. Click your avatar in the top-right of the UI.
  3. Click View API keys.
  4. Click + Create new secret key.
  5. Give a name for your new secret key and click Create secret key.
  6. Copy your new secret key and save it. Then click Done.

Embedding Model = drop-down

Select an embedding model.

Currently supports:

  • text-embedding-ada-002
  • text-embedding-3-small
  • text-embedding-3-large

API Batch Size = integer

Set the size of array of data per API call. The default size is 10. When set to 10, 1000 rows would therefore require 100 API calls.

You may wish to reduce this number if a row contains a high volume of data; and conversely, increase this number for rows with low data volume.


Pinecone Environment = string

The Pinecone environment to use. To retrieve an environment name:

  1. Log in to Pinecone.
  2. Click PROJECTS in the left sidebar.
  3. Click a project tile. This action will open the list of vector search indexes in your project.
  4. Each vector search index has an environment displayed in the tile.

Pinecone Project Id = string

The ID of your Pinecone project. To retrive a project ID:

  1. Log in to Pinecone.
  2. Click PROJECTS in the left sidebar. Each existing project will be displayed as a tile, displaying the project name and project ID.

Read Understanding projects to learn more.


Pinecone API Key = drop-down

Use the drop-down menu to select the corresponding secret definition that denotes the value of your Pinecone API key.

Read Secret definitions to learn how to create a new secret definition.

To create a new Pinecone API key:

  1. Log in to your organization's Pinecone account.
  2. Click API Keys in the left sidebar.
  3. Click + Create API Key. You must be a project owner to create an API key.
  4. In the Create New API Key modal, name your new key. Key names cannot be more than 7 characters.
  5. Click Create API Key.
  6. Your new API key will be listed. Click the Show Key Value button (an icon of an eye with a slash through it) to view your API key value.
  7. Click the adjacent Copy Key Value button to copy your API Key.

Pinecone Index Name = drop-down

The name of the Pinecone vector search index to connect to. The list is generated once you pass a valid Pinecone API key.


Pinecone Namespace = string

The name of the Pinecone namespace. Pinecone lets you partition records in an index into namespaces. To retrieve a namespace name:

  1. Log in to Pinecone.
  2. Click PROJECTS in the left sidebar.
  3. Click a project tile. This action will open the list of vector search indexes in your project.
  4. Click on your vector search index tile.
  5. Click the NAMESPACES tab. Your namespaces will be listed.

Upsert Batch Size = integer

Set the size of the batches of vectors that Pinecone receives. The default size is 100 vectors per request.

You may wish to reduce this number if a row contains a high volume of data; and conversely, increase this number for rows with low data volume.


Snowflake Databricks Amazon Redshift

Video Example