Split field
The Split Field transformation component lets you split the contents of a selected column at the position of a specified delimiter, and output the separate parts to new columns. The delimiter can be any valid character or sequence of characters. For example, an input field of ABC-DEF
split on the delimiter -
will output the values ABC
and DEF
to two new columns.
This component is equivalent to using the SQL SPLIT_PART function.
This component works on columns with numeric or date data, as well as on columns of text data, with numbers and dates being split as if they were strings. For example, given a numeric column with the value 1234
, splitting this column on a delimiter of 2
will produce output values of 1
and 34
. Be aware of the regional date format when splitting dates, as it may give unexpected results. Sampling the input dataset to see how the date looks as a string may be helpful here. An alternative way of splitting dates would be to use a DATE_PART function in the Calculator component.
The output of this operation will always be string data, but you can subsequently use the Convert Type component to cast the data type appropriately.
Use case
This component is especially useful when dealing with semi-structured or loosely formatted data output by a source system. Some common uses of this are:
- Splitting CSV or other delimited strings into columns. For example, splitting an address string on commas into separate columns for city, state, and ZIP code.
- Splitting personal names into separate columns for first name and last name, by splitting the full name at the space character.
- Parsing data that contains rows with predictable patterns, for example, log data that shows a date, time, and message.
- Splitting a date into its component parts (day, month, year), for example, to allow trend analysis on a monthly basis.
Properties
Name
= string
A human-readable name for the component.
Input Column
= drop-down
The input field to split.
Delimiter
= string
The delimiter to split on. The delimiter can be a string of several characters. For example, in a string of values enclosed in quotation marks and delimited by commas, you could split on ","
, which would remove the quotes that would otherwise be included if you split only on commas.
Output Columns
= column editor
Define the new columns that the split values will be put into.
- Position: The index number of the split field fragment that will go into this column. This index starts at 1. For example, when splitting an input of
ABC-DEF
on the delimiter-
,ABC
is in position 1 andDEF
is in position 2. - Output Column Name: The output column is the new column that will hold the extracted data from the specified position. The column will be a string data type, but you can use the Convert Type component later to cast the data type appropriately if required.
Include Input Column
= boolean
Select whether to keep the original field in the output or not. If you have extracted all the parts of the field, you may not need to also keep the original.