Skip to content

Join

The Join component is used to combine two or more input flows into a single output. It will join rows from two or more tables based on a column with common data (called the foreign key) that relates the tables together.

Read Additional information for more details of how to use the Join component.


Properties

Name = string

A human-readable name for the component.


Main Table = drop-down

Select one of the tables connected to the component's input.


Main Table Alias = string

Enter an alias for the main table. You can repeat the actual table name if desired.


Joins = column editor

Join Table: Select one of the tables connected to the component's input. You can select the main table to perform a self-join.

Join Alias: Each input table requires a unique alias. You can repeat the actual table name if desired. If you are joining the main table to itself, you need to give it a new alias (i.e. different from the one you assigned as the Main Table Alias).

Join Type: Specifies how the join is to be performed:

  • Inner: Performs an SQL inner join.
  • Left: Performs an SQL left join.
  • Right: Performs an SQL right join.
  • Full: Performs an SQL Full Outer join.

Join Expressions = expression editor

Use the expression editor to create the expressions used to define how the joins will operate. There must be exactly one expression for each join to be performed; you can edit these expressions but not add or delete expressions.

Each expression must be valid SQL and can use all of the built-in functions supported by your cloud data warehouse:

There is exactly one expression for each join, and the result of the expression is evaluated as True or False, which indicates whether the two records being compared 'match'. Often this will be a simple 'equality' condition, but it could be more complex, e.g. where a date falls within a start/end date range.


Column Mappings = column editor

  • Input Column: The name of an input column. There may be duplicate names in input flows, which is why it is important to have unique aliases. The alias name is shown to help distinguish the names.
  • Output Column: The output columns must be unique, so you have the opportunity to rename any of the available input columns.

If you later add another table, the output columns are not re-populated, so you will have to add any additional output columns that have been made available.

Click the Add all button to automatically populate the columns, as follows:

  • The Input Column will list all the columns from all the source tables. Each column name is prefixed by the join alias specified in the Joins property, followed by a dot.
  • The Output Column will list each column name from the Input Column, but the dots are replaced with underscores. If the column name is unique across the source tables, there is no prefix.

You can then selectively edit the list if required, but if you click Add all again it will return the list to the default fully populated state, removing any changes you have made.


Additional information

In SQL terms, the Join component generates a SELECT query with multiple inputs in the FROM clause, joined with an ON clause generated from the join expressions you configure in the component. For more details on the use of table joins, refer to any good SQL documentation.

The Join component must have at least two inputs, which provide the two tables that will be joined together. Typically, you will use two Table Input components, each of which is configured to point to one table. You can add a third, fourth, or even more inputs, allowing you to perform complex joins on multiple tables within the same Join component.

Note that it is possible to join a table to itself, in which case you need two (or more) Table Input components that each point to the same table.

When configuring the Join component, you will specify which of the inputs is the Main Table. The main table is considered the left table in terms of SQL joins, and must contain the foreign key, which will be used to join the other input tables to it. The main table must be given an Alias, which is used to refer to the table in join expressions. The reason for this is to allow the table to be joined to itself, as it can be referred to by two different aliases in join expressions to avoid confusion or conflict between which is the left and which is the right table.

Multiple joins can be created in a single Join component. When the component runs, the joins are executed in the order they have been listed in the Joins property, top to bottom, starting with the main table.

Joins are created using an expression editor, which assists you in writing the expressions used to define how the joins will operate. There must be exactly one expression for each join to be performed; you can edit these expressions but not add or delete expressions. Each expression should be a boolean expression which determines which rows in the second (right) table match those in the main (left) table for the purposes of this join. An example is:

"flights"."tailnum" = "planes"."tailnum"

This expression tells the component to select all rows from the input table "planes" where the "tailnum" field matches the "tailnum" field in the input table "flights". The join connects rows ONLY when the join condition is met (i.e. the join expression resolves to "true"). Note that "flights" and "planes" are aliases we have specified for the main table and the second table respectively; we never use actual table names in join expressions.

In the expression editor, creating this expression is as simple as selecting the first field from the Fields list in the editor, selecting the = operator, and then selecting the second field from the list.

Expressions can be more complex than the simple equality illustrated above. For example, an expression can resolve whether a date falls within a particular start/end range.

Each expression must be valid SQL, and can use all the built-in functions of your cloud data warehouse. For descriptions of built-in cloud data warehouse functions, see:

The Join component can perform the following types of standard SQL join:

  • Inner: Performs an SQL inner join.
  • Left: Performs an SQL left join.
  • Right: Performs an SQL right join.
  • Full: Performs an SQL full outer join.

You can also perform a cartesian join, which returns all the rows in all the tables, by creating an inner join with an expression that will always evaluate to "true" (for example, 1 = 1).


Snowflake Databricks Amazon Redshift (preview)