Skip to content

Databricks Vector Search

Public preview

Editions

Production use of this feature is available for specific editions only. Contact our sales team for more information.

The Databricks Vector Search transformation component performs a search on an input table to find content that best answers specific questions, using vector embeddings to identify suitable answers within data located in your Databricks account.

The component takes input from a single table that contains the questions you are asking in plain text, and searches a pre-existing index of data to find the best-fit answers.

The component outputs the best-fit answers to each query input row, with as many answers per query as asked for in the Top K property. The answers are output in the form of a column of JSON objects.

To learn more about the vector search function in Databricks, read the Databricks documentation.

Use case

Typical use cases for a vector search include the following:

  • Performing a semantic text search to return the most contextually relevant documents, even if they don't share exact keywords.
  • Personalizing content retrieval by matching users to relevant content based on their interests or behavior embeddings.
  • Powering support systems by finding the closest pre-written response or FAQ entry for a customer's question.

Properties

Name = string

A human-readable name for the component.


Model Serving Endpoint = drop-down

Select an existing Databricks endpoint that will allow you to access the vector search index you wish to use.


Index = drop-down

Select an existing Databricks vector search index.


Query Column = drop-down

Select the column of the input that contains the questions. The component operates on a single input column only. If you have multiple question columns in the table, you'll need to perform additional transformations on your data to reduce them to a single column before querying.

Additional columns in the input table (i.e. not only the column selected here) will also be retrieved and displayed in the output.


Top K = string

The number of results to return from the vector database query. Enter a value in the range 1-100. For example, 5 will return the top five best-fitting answers to the query.

Got feedback or spotted something we can improve?

We'd love to hear from you. Join the conversation in the Documentation forum!