Cortex Embed

Editions

Production use of this feature is available for specific editions only. Contact our sales team for more information.

Project Availability:

Snowflake ✅ Databricks ❌ Amazon Redshift ❌

Component Type:

Transformation

Connection Inputs:

One

Connection Outputs:

Unlimited

The Cortex Embed transformation component lets you convert English-language text into a Cortex vector embedding. The component takes text from a single input column and uses it to create a vector embedding of either 768 or 1024 dimensions. The output of this component is a column containing the vector embedding string. This column will be named embedding_result_<input-column-name>.

To learn more about Cortex vector embeddings, read Vector Embeddings.

To use this component, you must use a Snowflake role that has been granted the SNOWFLAKE.CORTEX_USER database role. Read Required Privileges to learn more about granting this privilege.

To learn more about Snowflake Cortex, such as availability, usage quotas, managing costs, and more, read Large Language Model (LLM) Functions (Snowflake Cortex).

Use case

This component enables you to create embeddings directly in your transformation pipeline, which can then be used for semantic searching, clustering, and identifying similar content. For example, you can use this component to:

Generate embeddings for video transcripts, which you can then search to find relevant content.
Convert product descriptions into "Similar products" embeddings, which you can use to suggest products to your customers.
Categorize support tickets by matching them to the appropriate support team.

Properties

Name = string

A human-readable name for the component.

Column to Embed = drop-down

Select the input column that will be used to create embeddings.

Model = drop-down

The vector embedding model to be used to generate the embedding.

For a vector embedding of 768 dimensions, choose one of:

snowflake-arctic-embed-m
e5-base-v2

For a vector embedding of 1024 dimensions, choose:

nv-embed-qa-4

Include Input Columns = boolean

Yes: Outputs both your source input columns and the new embedding_result_ column. This will also include those input columns not selected in Column to Embed.
No: Only outputs the new embedding_result_ column.