Skip to content

AI Summarize

Public preview

Editions

Production use of this feature is available for specific editions only. Contact our sales team for more information.

The AI Summarize transformation component uses the Databricks ai_summarize() function to generate a summary of a given input text. This function uses a Databricks chat model serving endpoint made available by Databricks Foundation Model APIs.

The input is a column of text data that is to be summarized. The output is a column of text data that summarizes the input text. You can select multiple columns to summarize from a single input table, and you can select the same input column multiple times to generate alternative summaries.

All rows in the selected input column will be summarized. If the content of any input row is NULL, the output for that row will be NULL.

Note

Make sure you have read and understand the Requirements set out by Databricks before using this component.

Use case

Some typical use cases for this component include:

  • Summarizing long text documents, such as articles, reports, or customer feedback, to extract key points and insights.
  • Generating concise summaries of product descriptions, reviews, or user comments to provide quick overviews of content.
  • Creating summaries of meeting notes, emails, or chat conversations to highlight important information and action items.

Properties

Name = string

A human-readable name for the component.


Columns = column editor

  • Input Column: The drop-down lists each column in the input stream. Choose the column to summarize. All input columns are available to select, but only text columns will produce meaningful summaries.
  • Alias: The name used for the output column that corresponds to this input column. Aliases must be unique.
  • Max number of words in summary: Optionally, specify the maximum number of words to allow in the summary text. If no number is specified, the default value is 50. If set to 0, there is no maximum word limit.

You can select multiple columns to summarize. You can also select the same column multiple times to generate alternative summaries, for example with different numbers of max words. In this case, the Alias allows you to differentiate the output columns.


Include Input Columns = boolean

  • Yes: Outputs both your source input columns and the new summary columns. This will also include those input columns not selected in Data Columns.
  • No: Only outputs the new summary columns.

Got feedback or spotted something we can improve?

We'd love to hear from you. Join the conversation in the Documentation forum!