Variables
Bug
This page is for Matillion ETL.
Due to a temporary bug, you may land on this page from the Data Productivity Cloud. To read the Data Productivity Cloud documentation for variables, click here.
Variables are name-value pairs stored within each environment. Variables can be used in all sorts of parameters and expressions to allow the user to pass and centralize environment specific configuration. They're also easily accessible through Python and Bash scripts where they are substituted in at run time.
Matillion ETL supports Environment Variables, Automatic Variables, Job Variables, and Grid Variables. For detailed information on each type, please visit the below links. For more general information on variables in Matillion ETL, read on.
Using variables
Variables can be used in place of hard-coded values in component properties. The syntax to include a variable is: ${variable_name}
.
For example, to specify the URL of an S3 bucket in the S3 Load component, you could enter the following property value:
s3://${bucket_name}/${path_name}/
This string includes two variables, bucket_name
and path_name
, which are resolved at run time to dynamically construct a URL every time the job runs, greatly increasing the flexibility of your job.
A variable can include a JavaScript expression, which will be resolved at run time. For example: ${new Date().getFullYear()}
.
Be aware of the following when choosing names for your variables:
- All alpha-numeric characters, as well as underscores, are valid for use in variable names. For example,
my_table_82
. - A variable name can't begin with a digit. For example,
82_my_table
is not a valid name. - If using variables in Python or Bash scripts, avoid naming them in a manner that clashes with key words in either language. One recommended way to avoid conflicts is to add a prefix such as
v_
to every variable name. - When a job runs, variables are resolved by first looking for a job variable, then an environment variable, then a system variable. Thus, if a job variable and environment variable of the same name exists, the job variable will be used.
Behavior (copied and shared)
Bug
This page is for Matillion ETL.
Due to a temporary bug, you may land on this page from the Data Productivity Cloud. To read the Data Productivity Cloud documentation for variables, click here.
Throughout this article, variables may be referred to as Copied and Shared. This refers to their "branch behavior" inside a job. A "branch" in this context means a divergence of connectors within a job, giving a branched structure. Iterator components are also examples of branching, albeit with a different aesthetic. The branch behavior describes how a variable behaves when updated during a job run:
- Copied variables can be updated within one branch of a job without updating variables of the same name in other branches. A branch using a copied variable will begin by taking its default value, even if this variable has been updated in another, parallel branch.
- Conversely, Shared variables are updated in a job-wide fashion. As such, regardless of branches, if a Shared variable is updated in one branch, all other branches will use that updated value.
Note
In previous versions of Matillion ETL, Copied and Shared variables have been referred to as Local and Global, respectively, and can be thought of synonymously when reading the documentation.
Types
The value of any given variable must be one of the following types. This type defines how a variable can be used and is important to consider when passing the variable to components and functions. In the Matillion ETL client, types are often referred to by the symbols shown below:
Type | Description |
---|---|
Text | Any text string. |
Numeric | Any number, with or without decimals (Matillion ETL for Redshift also allows Real and Double Precision types in some features). |
DateTime | Date and time values in the following formats yyyy-MM-dd , yyyy-MM-dd HH:mm , yyyy-MM-dd HH:mm:SS , or yyyy-MM-dd HH:mm:SS.SSS . |
Data Structure | Data Structure Variables dynamically populate parameters that require a structured object. |
Text Mode
On many menus, such as Manage Job Variables and Manage Environment Variables, users can switch to a Text Mode by ticking the checkbox next to Text Mode. Switching to Text Mode allows users to:
- Edit variables as a set of space-delimited text entries.
- Convert all existing variable data into an appropriate format.
- Copy and paste text, offering a fast way to move large numbers of variables.
- Have certain types of missing data auto-completed by Matillion ETL when parsing.
- Error check data before adding it to the project (Matillion ETL parses the data on exiting Text Mode, throwing a message if an error has been made).
Using variables in scripts
Python scripts
Variables are visible to Python scripts created in the Python Script component, and in those scripts they will act as any other Python variable. For example, to show the value of a job variable myVar
you can use the following Python command:
print (myVar)
From the Python script, you can use Python's context
object to assign a new value to the variable. For example:
context.updateVariable ('myVar', 'a_new_value')
This new value for the variable will now be used in all subsequent components as the job continues to run. This gives you a powerful tool for creating complex yet flexible jobs, as some very simple Python scripts can dynamically change the behavior of the job at runtime by manipulating variable values.
You can manipulate the variable in any desired way within the Python script, but the value of the variable within the job itself won't change unless you pass it back with the context
object.
Warning
Python will allow you to change the type of a variable, for example by assigning a string to a variable that previously held an integer. When you intend to pass the variable back to the job using the context
object, you must take care that you don't change the variable type in your Python script. Doing so will have consequences in your job and may cause it to fail.
A full discussion of Python scripting is beyond the scope of this article. Refer to any good Python reference for further information.
Bash scripts
Variables are visible to Bash scripts created in the Bash Script component. Any changes made to such variables within the script will never be visible outside of the current script execution, so variables can't be updated in the script and then passed back to the job.
To use a variable in a Bash script, prefix the variable name with $
, as follows:
echo $myVar
Filtering Booleans
It's possible to use boolean statements to filter values. In the example below, you can type True or False for the value combined with AND and OR conditional statements. You can also use 1 and 0 for the values, which work as True or False values, respectively.
Example 1
As a basic example, a defined variable can be used to perform a simple calculation in a Transformation Job. The job consists of a Generate Sequence component linked to a Calculator component. The Generate Sequence component can be used to create a column of numbers that are passed to the Calculator.
- Click the Generate Sequence component icon. Then, in the Properties panel, set the Starting Value, Increment Value and End Value to "1", "1" and "10", respectively. This will create a column of data that counts from 1 to 10.
-
Next, click on the Calculator component icon. Enter the Calculation Editor and add a new calculation. This will allow values in the sequence row (output by the Generate Sequence component) to be multiplied, and then multiply that value by a number. In this case, each value in the sequence column is multipled by 4 using the following calculation:
"sequence"*4
-
The output of the Calculator can be viewed by clicking the Sample tab from within the component's Properties panel, then clicking ⟳ Data. It will show the sequence column is being multiplied by the number specified in the calculation.
- A variable could have been used in this calculation instead. First, a variable must be declared via Project, then Manage Environment Variables. A new variable can be declared here and given a type, value and scope. Since the variable for this example will be used in a simple calculation, the Numeric type is appropriate. The scope is of little consequence for this job and can be set to Shared.
- The default value can then be set to any number.
-
Finally, the Calculator component must be instructed to use this variable in its calculation. Reentering the Calculation Editor, the constant multiplier can be replaced with the newly declared variable using the following calculation:
"sequence" * ${example_var}
-
Checking the sample output confirms that the Calculator component has correctly used the variable's default value in the calculation.
Example 2
In this example, two environments have been created, namely Test and Live. A common use case is to work on a more limited dataset in Test while having a full dataset in Live. A "starting_year" variable will define how much data is to be used in the Test and Live environments when applied in a filter.
- Variables can be predefined, with a default value for each environment. It is not necessarily true that two environments have different default values. However, in this example, the variable "starting_year" will be set to an earlier value for use with the Live environment (for example,
2005
). In the Test environment, less data is required and can thus start collecting it from a much later year (for example,2015
). - Now that a variable has been defined, it can normally be used in all expression editors and free text fields. In this example, the variable is used by a Filter component to limit the dataset of a Table Input component.
-
The Filter component is set up to filter any table rows with values greater than
${starting_year)
in the "year" input column. Due to how default values were defined for "starting_year", if this job is run in the Test environment, only flights from 2016 onward will remain. In the Live environment, flights from 2006 onward will remain.Note
JavaScript expressions are only available to use as parameter values within Matillion ETL components. Any valid single JavaScript expression may be used, however it is recommended that only simple expressions are used. All variables defined in the job and / or environment should also be available to reference.
Warning
When entering values into a component's parameter editor, everything enclosed within
${ }
of the literal string will be evaluated immediately. This validation process currently does not take variables into account and may assume the value is incorrect. -
The SQL generated by this component and its settings can be viewed by clicking the SQL tab. The example below shows the SQL for the Live environment.
SELECT * FROM ($T{Flights}) WHERE ("year" > '2005')
-
Instead of relying on variables with default values, a JavaScript expression can also be used. JavaScript expressions do not require any variables to be defined in the environment. Using the JavaScript expression
${new Date().getFullYear()-10}
in a filter condition will generate the following SQL, given that the current year generated by the JavaScriptDate()
function is 2024:SELECT * FROM ($T{Flights}) WHERE ("year" > '2014.0')