Skip to content

AI Extract

AI Extract is a transformation component that uses the Databricks ai_extract() function to extract entities from a given text according to labels you provide. This function uses a Databricks chat model serving endpoint made available by Databricks Foundation Model APIs.

The output is a STRUCT where each field is a string containing an extracted entity that matches the specified label. For example, if you give the label email and an email address can be identified in the input, that email address will be put into the output STRUCT in the following format: {"email": "sales@example.com"}.

If the input is NULL, the output result is NULL.

Note

Make sure you have read and understand the Requirements set out by Databricks before using this component.


Properties

Name = string

A human-readable name for the component.


Column = drop-down

Select an input column that contains the source text.

The component will operate on only a single column of the input. If you need to extract from multiple columns of a source table, use multiple copies of the AI Extract component.


Extract Labels = column editor

Enter a list of labels which will be used to define what elements will be extracted from the text. For example, name, email.

Enter one label per row in the Extract Labels dialog. Click + to add a new row.


Include Input Columns = boolean

  • Yes: Outputs both your source input columns and the extracted data column. This will also include those input columns not selected in Column.
  • No: Only includes the extracted data column.

Snowflake Databricks Amazon Redshift