Extract Nested Data
The Extract Nested Data component unpacks arrays of semi-structured data, for example in a JSON file, into structured data, in the form of columns and rows of data in a table. The component is especially useful when using a Custom Connector to access an API that returns data in JSON format. The flow into this component should include a single variant-type column that is to be unpacked.
Each element in the source semi-structured data can be mapped to a different column in your target table. For example, consider the following JSON structure, containing an array of three elements:
{
"name": "varchar",
"png": "varchar",
"alt": "varchar"
}
When data in this format is extracted, each of the three array elements can be mapped to a different column in the target table. When we sample the data from the transformation, we will see something like the following table:
Name | Flag | Alt_Text |
---|---|---|
Cyprus | https://flagcdn.com/w320/cy.png | The flag of Cyprus. |
Somalia | https://flagcdn.com/w320/so.png | The flag of Somalia. |
Venezuela | https://flagcdn.com/w320/ve.png | The flag of Venezuela. |
Use Extract Nested Data's Columns property to define which elements in the source array will be mapped to columns in the target structured data.
Note
For an alternative method of extracting semi-structured data, you can use the Flatten Variant component.
Properties
Name
= string
A human-readable name for the component.
Include Input Columns
= boolean
Choose whether to include input columns in the output.
Columns
= data structure
Use this property to select which elements from the semi-structured input will be mapped to columns in the output. The Columns dialog shows a graphical represenation of every addressable element in the input. Each element has a corresponding checkbox. Select an element's checkbox to include that element in the output. No elements are selected by default.
- To select every element, click Select all.
- To deselect every element, click Clear all.
- To edit an element, click . . . , next to the element, and then click Edit element.
- To add a new element, click . . . next to the VARIANT heading at the top of the structure, and then click Add element. Each element should be assigned a unique Key, a Type, and an Alias.
- To delete an element, click . . . next to it and click Delete element.
- To automatically add every element to the structure, click Autofill.
- To remove all elements added to the structure so far, reverting to a blank structure, click Reset.
Click Save when you have finished editing and selecting elements.
Outer Join
= boolean
Determines how to handle input rows that can't be expanded (for example, because they have no fields to expand, or because they can't be accessed). Select No to completely omit these rows from the output, Yes to generate an output row with NULL
values.
Input Alias
= string
If two input elements have identical names, one will be given this prefix to differentiate them. More than two identically named elements will result in an error. The default is i
, and this does not need changing in the vast majority of use cases.
Array Prefix
= string
If two array structures have identical names, one will be given this prefix to differentiate them. More than two identically named structures will result in an error. The default is f
, and this does not need changing in the vast majority of use cases.
Casting Method
= drop-down
Select how invalid or unparsable input elements will be handled:
- Fail on invalid data (the default)
- Replace all unparseable values with null
- Replace unparsable dates and timestamps with null
Case Columns Alias Names
= drop-down
Set the case that will be used for alias column names. Settings include Upper, Lower, or No (the default).
Name
= string
A human-readable name for the component.
Columns
= data structure
Use this property to select which elements from the semi-structured input will be mapped to columns in the output. The Columns dialog shows a graphical represenation of every addressable element in the input. Each element has a corresponding checkbox. Select an element's checkbox to include that element in the output. No elements are selected by default.
- To select every element, click Select all.
- To deselect every element, click Clear all or Reset.
Click Save when you have finished editing and selecting elements.
Aliases
= column editor
By default, the output columns will have the same names as the input elements. You can rename the output columns by specifying aliases in this dialog.
- Source Column: Select the source element that you wish to provide an alias for.
- Target Column: Provide a name for the output column.
You can provide aliases for any or all input elements. This property is optional, however, and can be left empty if you don't wish to provide alternative names for any columns.
Include Input Columns
= boolean
Choose whether to include input columns in the output.
Outer Join
= boolean
Determines how to handle input rows that can't be expanded (for example, because they have no fields to expand, or because they can't be accessed). Select No to completely omit these rows from the output, or Yes to generate an output row with NULL
values.
Name
= string
A human-readable name for the component.
Include Input Columns
= boolean
Choose whether to include input columns in the output.
Columns
= data structure
Use this property to select which elements from the semi-structured input will be mapped to columns in the output. The Columns dialog shows a graphical represenation of every addressable element in the input. Each element has a corresponding checkbox. Select an element's checkbox to include that element in the output. No elements are selected by default.
- To select every element, click Select all.
- To deselect every element, click Clear all or Reset.
Click Save when you have finished editing and selecting elements.
Column Aliases
= string
If two input elements have identical names, one will be given this prefix to differentiate them. More than two identically named elements will result in an error. The default is i, and this does not need changing in the vast majority of use cases.
Array Prefix
= string
If two array structures have identical names, one will be given this prefix to differentiate them. More than two identically named structures will result in an error. The default is f, and this does not need changing in the vast majority of use cases.
Snowflake | Databricks | Amazon Redshift |
---|---|---|
✅ | ✅ | ✅ |