Skip to content

Convert String To Struct

The Convert String To Struct transformation component takes JSON format data that is stored as a STRING data type and converts it to a Databricks STRUCT data type that can then be queried and manipulated.

In SQL terms, this is equivalent to performing a SELECT CAST() operation on the string data.

Although many APIs return JSON data as a string, this isn't typically a useful format for JSON data as transformation components can't directly query or extract individual JSON elements within a string. To address specific elements in a JSON string, you would first need to use Convert String To Struct, and then use a component such as Extract Structured Data.

The component operates on a single selected column from the input dataset, so if the dataset includes multiple columns to be converted you would need multiple instances of Convert String To Struct in the pipeline.

Use case

Some common uses for this component include:

  • Extract fields from JSON strings in an input table. You can't directly access individual items within the string unless you parse the JSON string. Using this component, you convert the string into a structured object, allowing you to reference its elements in downstream components.
  • Work with string data returned from an API by a Custom Connector. Many APIs will return JSON data in a string format, which must be converted before it can be usefully manipulated.

Properties

Name = string

A human-readable name for the component.


Column Name = drop-down

Select an input column that contains the source text.

The component will operate on only a single column of the input. If you need to extract from multiple columns of a source table, use multiple copies of the Convert String To Struct component.


Columns = data structure

Use this property to define the output structure. The component will attempt to match the elements you define in this structure to elements in the text input. If an element defined here can't match with any part of the input string, that element will be output with a value of null.

The Columns dialog shows a graphical representation of the output structure.

  • To add a new element, click ... next to the STRUCT element at the top of the structure, and then click Add element. Each element should be assigned a unique Key and a Type.
  • To edit an element's Key or Type, click ... next to the element and click Edit element.
  • To delete an element, click ... next to it and click Delete element.
  • To remove all elements added to the structure so far, reverting to a blank structure, click Reset.

Click Save when you have finished editing and selecting elements.


Include Input Columns = boolean

Choose whether to include input columns in the output stream.


Snowflake Databricks Amazon Redshift