Join
The Join component is used to combine two or more input flows into a single output. It will join rows from two or more tables based on a column with common data (called the foreign key) that relates the tables together.
See Additional information for more details of how to use the Join component.
Note
A bug has been detected with this component in version 1.76.0. We urge customers who use marketplace listings to upgrade to version 1.76.1 or later, where this bug has been fixed.
Properties
Name
= string
A human-readable name for the component.
Main Table
= drop-down
Select one of the tables connected to the component's input.
Main Table Alias
= string
Enter an alias for the main table. You can repeat the actual table name if desired.
Joins
= column editor
- Join Table: Select one of the tables connected to the component's input. You can select the main table to perform a self-join.
- Join Alias: Each input table requires a unique alias. You can repeat the actual table name if desired. If you are joining the main table to itself, you need to give it a new alias (i.e. different from the one you assigned as the Main Table Alias).
- Join Type: Specifies how the join is to be performed:
- Inner: Performs an SQL inner join.
- Left: Performs an SQL left join.
- Right: Performs an SQL right join.
- Full: Performs an SQL full outer join.
Join Expressions
= expression editor
Use the expression editor to create the expressions used to define how the joins will operate. There must be exactly one expression for each join to be performed; you can edit these expressions but not add or delete expressions.
Output Columns
= column editor
- Input Column: The name of an input column. There may be duplicate names in input flows, which is why it's important to have unique aliases. The alias name is shown to help distinguish the names.
- Output Column: The output columns must be unique, so you have the opportunity to rename any of the available input columns. If you later add another table, the output columns are not re-populated, so you will have to add any additional output columns that have been made available.
Additional information
In SQL terms, the Join component generates a SELECT query with multiple inputs in the FROM clause, joined with an ON clause generated from the join expressions you configure in the component. For more details on the use of table joins, refer to any good SQL documentation.
The Join component must have at least two inputs, which provide the two tables that will be joined together. Typically, you will use two Table Input components, each of which is configured to point to one table. You can add a third, fourth, or even more inputs, allowing you to perform complex joins on multiple tables within the same Join component.
Note that it is possible to join a table to itself, in which case you need two (or more) Table Input components that each point to the same table.
When configuring the Join component, you will specify which of the inputs is the Main Table. The main table is considered the left table in terms of SQL joins, and must contain the foreign key, which will be used to join the other input tables to it. The main table must be given an Alias, which is used to refer to the table in join expressions. The reason for this is to allow the table to be joined to itself, as it can be referred to by two different aliases in join expressions to avoid confusion or conflict between which is the left and which is the right table.
Multiple joins can be created in a single Join component. When the component runs, the joins are executed in the order they have been listed in the Joins property, top to bottom, starting with the main table.
Joins are created using an expression editor, which assists you in writing the expressions used to define how the joins will operate. There must be exactly one expression for each join to be performed; you can edit these expressions but not add or delete expressions. Each expression should be a boolean expression which determines which rows in the second (right) table match those in the main (left) table for the purposes of this join. An example is:
"flights"."tailnum" = "planes"."tailnum"
This expression tells the component to select all rows from the input table "planes" where the "tailnum" field matches the "tailnum" field in the input table "flights". The join connects rows ONLY when the join condition is met (i.e. the join expression resolves to "true"). Note that "flights" and "planes" are aliases we have specified for the main table and the second table respectively; we never use actual table names in join expressions.
In the expression editor, creating this expression is as simple as selecting the first field from the Fields list in the editor, selecting the = operator, and then selecting the second field from the list.
Expressions can be more complex than the simple equality illustrated above. For example, an expression can resolve whether a date falls within a particular start/end range.
Each expression must be valid SQL, and can use all the built-in functions of your data warehouse. For descriptions of built-in data warehouse functions, see:
The Join component can perform the following types of standard SQL join:
- Inner: Performs an SQL inner join.
- Left: Performs an SQL left join.
- Right: Performs an SQL right join.
- Full: Performs an SQL full outer join.
You can also perform a cartesian join, which returns all the rows in all the tables, by creating an inner join with an expression that will always evaluate to "true" (for example, 1 = 1
).
Join types are discussed in the video below, but a full discussion of SQL joins is beyond the scope of this article; please refer to any standard SQL documentation if you wish to learn more.
Video
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ |