AI Mask
Public preview
Editions
Production use of this feature is available for specific editions only. Contact our sales team for more information.
The AI Mask transformation component uses the Databricks ai_mask() function to invoke generative AI to identify and mask specified entities in unstructured text. This function uses a Databricks chat model serving endpoint made available by Databricks Foundation Model APIs.
The input to this component is a column of data in string format. The component operates on a single column only, so if you have multiple text columns you want to mask in the input datastream, you will need to use multiple instances of the AI Mask component, and then combine the outputs downstream in your pipeline.
You also need to specify the type of data you want to be masked (for example: name, email). To do this, use the Mask Labels
property.
The output is a column of string data with the labelled data masked.
Note
Make sure you have read and understand the Requirements set out by Databricks before using this component.
Example
A simple example of how masking works is as follows.
The input string is: "These comments were made by customer John Doe. For further clarification contact him at john.doe@company.com."
We want to redact the name and email before we share this data. To accomplish this, we run the AI Mask component with the labels person
and email
.
The output string is: "These comments were made by customer [MASKED]. For further clarification contact him at [MASKED]."
Use case
The AI Mask component is used to mask sensitive information in any type of input text. Some typical uses of this include:
- Automatically detect and redact Personally Identifiable Information (PII) or Protected Health Information (PHI) from text fields before storage, sharing, or analysis, to facilitate compliance with privacy standards such as GDPR and HIPAA.
- Anonymize customer data (like account numbers, names, and emails) in support logs before sharing for training or analytics.
Properties
Name
= string
A human-readable name for the component.
Column
= drop-down
Select the column that holds data you wish to mask.
Mask Labels
= column editor
Add mask labels as text strings. Each label represents a type of information to be masked. For example, adding a mask label of name
prompts the component to mask any names in a given row.
Include Input Columns
= boolean
- Yes: Includes both your input column and the newly created masked column. This will also include those input columns not selected in Column.
- No: Only includes the new, masked column.