Grid variables
Grid variables are a special type of variable. A grid variable is a two-dimensional array that holds multiple values in named columns.
A grid variable can only be declared as a pipeline variable, not a project variable.
Creating grid variables
Creating a grid variable follows a similar process to creating any other variable, but has some important differences as described here.
- Open a pipeline on the pipeline canvas.
- Click the Variables tab to see all pipeline and project variables available to the current pipeline. In this tab, you can edit or delete any existing variables, or create a new variable.
- Click Add to open the Add a variable dialog.
- Select Pipeline variable.
- Select Grid in the Type drop-down.
-
Complete the following fields, and then click Next:
- Variable name: Each variable (grid or otherwise) must have a unique name within this pipeline, though variables with the same name can be defined in a different pipeline. Variable names can include letters, digits, and underscores, but can't begin with a digit. For example,
my_table_02
and_02_my_table
are valid names, but02_my_table
is not. A variable name must not be a JavaScript reserved word (e.g.var
orconst
). If you want to use the variable in a Python or Bash script, it must not have the same name as a reserved word in those scripts. - Description: A description to inform other users of the purpose of the variable. This is optional, but its use is recommended.
- Visibility: Select Public or Private. Read Grid variable visibility, below, for a full explanation.
- Variable name: Each variable (grid or otherwise) must have a unique name within this pipeline, though variables with the same name can be defined in a different pipeline. Variable names can include letters, digits, and underscores, but can't begin with a digit. For example,
-
Create columns in the grid variable, clicking + to add additional columns as needed. See Columns, below, for an explanation of how to use columns. Click Next when all the required columns are defined.
- Add default values to the variable if required. This isn't mandatory, but adding sensible defaults can help reduce errors and aid in debugging when the pipeline runs. See Row values, below, for an explanation of using a grid variable to store values.
- Click Finish to save the grid variable.
Text mode
Text mode is a feature that lets you rapidly add column definitions and default values to your grid variable. Instead of typing or selecting values in the individual fields for these items, you can click the Text mode toggle to open a multi-line editor that lets you add all the items in a single block, one item to a line, with the "fields" separated by tabs. For example, to define three columns in text mode you can type the following:
Name Text
Descr Text
Cost Number
Once you have entered details in text mode, you must switch out of text mode by clicking the Text mode toggle once more, to verify the details are valid before you continue. For example, if you mistyped Cost Numbor
, the Column type field for "Cost" would reject the invalid type "Numbor" and remain blank. You must correct the type, either directly in the field or in text mode again, before you can proceed with variable creation.
You can go into text mode to edit the column definitions or default values of an existing grid variable, regardless of whether it was originally created in text mode or not.
An advantage of text mode is that it will allow you to paste in values from other text sources, as long as the source has items correctly separated by tabs and new lines. You can also rapidly create a copy of an existing grid variable:
- Open the existing grid variable in text mode.
- Copy the text containing the column definitions.
- Create the new grid variable.
- Paste the copied text into the text mode field of the new grid variable.
Similarly, you can rapidly copy default values from one grid variable to another, assuming they have matching columns.
Columns
A grid variable is a two-dimensional array, or table, consisting of one or more named columns with one or more values in each column. The columns in a grid variable are defined and fixed when the variable is created, but the values they contain can be changed dynamically at runtime.
Each column in the grid can be a different variable type, so you could have a column of text and another column of numbers, for example.
To define the columns, enter the following details when the grid variable is created:
- Column Name: A name to identify the column. This must be unique within the grid variable, but can be the same as column names defined in other grid variables. The name can only contain letters, numbers, underscores
_
and dollar symbols$
. It can't start with a number. - Column Type: The data type that the column will contain. Select Text or Number. The values stored in each column must be suitable for the data type of the column.
Row values
Each row of a grid variable holds a single value for each of the defined columns. This allows you to construct a "table" of related information in a single grid variable. For example, a grid variable with three columns defined as "Name" (text), "Salary" (number) and "Location" (text) could have three rows of values as follows:
Name | Salary | Location |
---|---|---|
Alice | 35000 | New York |
Bob | 40000 | San Francisco |
Charlie | 30000 | Chicago |
When a pipeline run begins, all of its grid variables are initialized with the default values that were specified when the variable was created. Variable values can be assigned or reassigned dynamically while the pipeline runs. Values set during the run are valid only for the current run of the pipeline, and for the next run of the pipeline the variable will revert to its default values, or a new value must be assigned.
If you construct a branching pipeline and update a grid variable value in one branch, the value only changes in that branch; the other branch will use the original value of the variable. In effect, each branch of a complex pipeline has its own "local copy" of a grid variable, each of which can be updated independently. If the branches later rejoin, for example through the use of an And or Or component, the grid variable will revert to its default values, regardless of any updates made in either branch.
Grid variable visibility
Setting the correct visibility for a pipeline grid variable is important when you want to call a pipeline from another pipeline, using the Run Orchestration or Run Transformation components.
A Private grid variable is only visible to the pipeline it is defined in. If the pipeline is called from another pipeline, the calling pipeline can't "see" the Private variable and so can't use its value, reset its value, or otherwise interact with it in any way.
A Public grid variable is visible outside the pipeline it is defined in, so it can be "seen" and used by any pipeline that calls the pipeline where it is set.
Using grid variables
The following components are specifically for use with grid variables:
- Grid Iterator implements a loop over the rows of a grid variable, running an attached component multiple times, each time with a different row of values from the grid.
- JDBC Table Metadata to Grid takes the metadata from a JDBC table and uses this data to populate a grid variable.
- Query Result to Grid queries a table and returns rows of data that are loaded into a predefined grid variable.
- Remove from Grid removes rows of data (but not columns) from a grid.
- Table Metadata to Grid takes the metadata from a table and uses this data to populate a grid variable.
Grid variables may also be used to populate property values in many different components. Where grid variables can be used, the dialog which sets the property value will include a checkbox labelled Use Grid Variable.
Some common use cases for grid variables in components are described below.
Data selection
Query components have a Data Selection property that lets you select the columns you want from the data source. Instead of manually selecting the columns when you configure the query component, you can specify that a grid variable will provide the column names. By using a variable instead of hard-configuring the component, you can re-use the same column names across all components in the pipeline, making it easier to consistently set the property and make global changes to all such properties.
To use a grid variable in a Data Selection property:
- Create a grid variable with a single column of type Text.
-
Populate the default values of the grid variable with the names of the columns you want to include in the Data Selection property.
Note
These default values could be set or changed dynamically at runtime by use of other components such as Python Script.
-
In the Data Selection property of your query component, select Use Grid Variable. This will change the dialog to display Grid Variable and Grid Column drop-downs instead of the column selection listboxes.
- Use the drop-downs to select the grid variable and the name of the grid variable column.
- Click Save.
Populate metadata
In the Create Table and Create External Table components, the metadata required to define table columns in the Columns property can be assigned from a grid variable, as follows:
- Create a grid variable with one column for each piece of metadata you want to populate (Column Name, Data Type, Size, Precision, etc.).
-
Populate the default values of each column with the values of the metadata.
Note
These default values could be set or changed dynamically at runtime by use of other components such as Python Script.
-
In the Columns dialog, select Use Grid Variable. This will change the dialog to display Grid variable and Column Mapping drop-downs instead of the column definition fields.
- In the Grid variable drop-down, select which grid variable to use for metadata.
- In each Column Mapping drop-down, select the grid variable column that will be used to provide a value to each piece of column metadata.
- Click Save.
Populate load options
Query components have a Load Options property that configures different aspects of data loading. The Load Options dialog contains five drop-downs and a text field, which you set individually. Instead of manually selecting the setting of each drop-down, you can use a grid variable to make the selections. By using a variable, you can keep your preferred load options consistent across all query components in the pipeline, and easily make global changes to all such properties.
To use a grid variable in a Load Options property:
-
Create new grid variable with two columns defined as follows:
Column name Column type Key Text Value Text -
Click Next.
-
Add six rows to the variable, one for each of the six options in the Load Options dialog. In each row, the Key column must contain the name of one of the load options, and the corresponding Value column must contain a valid value for that load option. For example:
Key Value Clean Staged Files On String Null is Null Off Recreate Target Table On File Prefix Trim String Columns On Compression type Gzip Note
The values provided for the drop-downs must exactly match an option in the corresponding drop-down. For example, Clean Staged Files has a choice of
On
andOff
, and one of those words must be typed exactly in the value column, including the capitalization. -
In the Load Options dialog, select Use Grid Variable. This will change the dialog to display Grid variable and Column Mapping drop-downs instead of the load option drop-down fields.
- In the Grid variable drop-down, select the grid variable you created for load options.
- In the Column Mapping drop-down, select Value.
- Click Save.
Passing variables between pipelines
Grid variable values can be set when you call one pipeline from another using the Run Orchestration and Run Transformation components, which allows you to pass variable values from the "parent" to the "child" pipeline it is running. Use the Set Grid Variables property on those components to configure what values will be set in the "child" grid variables and what will be passed to them from the "parent".
Using grid variables in scripts
Python script
Grid variables are visible to Python scripts created in the Python Script component, and in those scripts they will act as any other Python variable.
In the Python script component, you can use Python's context
object to read and write grid variable values. Because grid variables are arrays, you can use Python arrays to hold the grid variable data.
To get the values from a grid variable, use:
context.getGridVariable('<GridName>')
To write values into a grid variable, use:
context.updateGridVariable('<GridName>', <values>)
The following Python script example takes data from one grid variable, "people", puts the data into a Python array called "p_array", then copies it to different a grid variable, "names":
p_array = context.getGridVariable('people')
context.updateGridVariable('names', p_array)
New values written to a grid variable by Python will be used in all subsequent components that use that grid variable as the pipeline continues to run. This gives you a powerful tool for creating complex yet flexible pipelines, as some very simple Python scripts can dynamically change the behavior of the pipeline at runtime by manipulating variable values.
Note that you can manipulate the variable array in any desired way within the Python script, but the value of the variable within the pipeline itself won't change unless you pass it back with the context
object.
Warning
Python will allow you to change the type of a variable, for example by assigning a string to a variable that previously held an integer. When you intend to pass the variable back to the pipeline using the context
object, you must take care that you do not change the variable type in your Python script. Doing so will have consequences in your pipeline and will likely cause it to fail.
A full discussion of Python scripting is beyond the scope of this article. Refer to any good Python reference for further information.
Python Pushdown
Grid variables are visible to Python scripts created in the Python Pushdown component, and in those scripts they will act as any other Python variable.
In the Python Pushdown, you can use Python's print
object to read grid variable values. Python Pushdown does not require the context
object to read grid variables. You can just refer to it directly, using the example provided below.
To read the values from a grid variable, use:
print (GridName)
To update values into a grid variable, use:
context.updateGridVariable('GridName',[['list','of'],['lists','!']])