Aggregate
The Aggregate component groups together multiple input rows into a single output row. Input columns can be added to the groupings or have an aggregation applied to them.
Default output names are chosen by combining the source column name and the aggregation type. If they are not appropriate, consider using a Rename component immediately after the Aggregate component to choose more appropriate names.
Properties
Name
= string
A human-readable name for the component.
Groupings
= dual listbox
Select one or more columns from the source table that will form the groupings.
The output flow will contain one row for each distinct combination of values within the grouping columns.
Aggregations
= columns editor
- Source Columns: Select the input (source) column for the summary function (the summary function is the aggregation type).
- Aggregation Type: Users can add the same source column multiple times, each with a different aggregation type. The summary is calculated for each combination of the source column values.
Matillion ETL currently supports over 25 Snowflake aggregate functions. Read Snowflake's Aggregate Functions documentation for a detailed description of each function.
Grouping Type
= drop-down
Select how to group rows.
- Group By: Groups rows with the same group-by-item expressions and computes aggregate functions for the resulting group. A GROUP BY expression can be a column name, a number referencing a position in the SELECT list, or a general expression.
- Group By Cube: GROUP BY CUBE is an extension of the GROUP BY clause similar to GROUP BY ROLLUP. In addition to producing all the rows of a GROUP BY ROLLUP, GROUP BY CUBE adds all the "cross-tabulations" rows. Subtotal rows are rows that further aggregate whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.
- Group By Grouping Sets: GROUP BY GROUPING SETS is a powerful extension of the GROUP BY clause that allows computing multiple group-by clauses in a single statement. The group set is a set of dimension columns.
- Group By Rollup: GROUP BY ROLLUP is an extension of the GROUP BY clause that produces subtotal rows (in addition to the grouped rows). Subtotal rows are rows that further aggregate whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.
Name
= string
A human-readable name for the component.
Groupings
= dual listbox
Select one or more columns from the source table that will form the groupings.
The output flow will contain one row for each distinct combination of values within the grouping columns.
Aggregations
= columns editor
- Source Columns: Select the input (source) column for the summary function (the summary function is the aggregation type).
- Aggregation Type: Users can add the same source column multiple times, each with a different aggregation type. The summary is calculated for each combination of the source column values. Available functions include:
- Approximate Count
- Average
- Count
- Count Distinct
- Max
- Min
- Standard Deviation - Sample
- Standard Deviation - Population
- Sum
- Variance - Sample
- Variance - Population
- Sum Distinct
- Skewness
- Kurtosis
For more information on each available function, click here.
Name
= string
A human-readable name for the component.
Groupings
= dual listbox
Select one or more columns from the source table that will form the groupings.
The output flow will contain one row for each distinct combination of values within the grouping columns.
Aggregations
= columns editor
- Source Columns: Select the input (source) column for the summary function (the summary function is the aggregation type).
- Aggregation Type: Users can add the same source column multiple times, each with a different aggregation type. The summary is calculated for each combination of the source column values.
- Min: Find the Minimum value of the source column.
- Max: Find the Maximum value of the source column.
- Count: Count the total number of records whose source column is not null.
- Count Distinct: Count the total number of distinct (unique) values of the source column.
- Approximate Count: This is a variant of Count Distinct that may have better performance in cases where an exact count is not required. See the Redshift documentation for more details.
- Sum: Sum the values in the source column. This is only sensible for numeric source columns.
- Average: Average the values in the source column. This is only sensible for numeric source columns.
- Standard Deviation (sample): Calculates the sample standard deviation of the source column. This is only sensible for numeric source columns. - Standard Deviation (population): Calculates the population standard deviation of the source column. This is only sensible for numeric source columns.
- Variance (sample): Calculates the sample variance of the source column. This is only sensible for numeric source columns.
- Variance (population): Calculates the population variance of the source column. This is only sensible for numeric source columns.
- Median: Calculates the median value for the range of values. For more information, refer to the Redshift Median documentation.
For more information on the Aggregation Types, refer the Redshift Aggregate Functions documentation.
Name
= string
A human-readable name for the component.
Groupings
= dual listbox
Select one or more columns from the source table that will form the groupings.
The output flow will contain one row for each distinct combination of values within the grouping columns.
Aggregations
= columns editor
- Source Columns: Select the input (source) column for the summary function (the summary function is the aggregation type).
- Aggregation Type: Users can add the same source column multiple times, each with a different aggregation type. The summary is calculated for each combination of the source column values.
- Min: Find the Minimum value of the source column.
- Max: Find the Maximum value of the source column.
- Count: Count the total number of records whose source column is not null.
- Count Distinct: Count the total number of distinct (unique) values of the source column.
- Approximate Count: This is a variant of Count Distinct that may have better performance in cases where an exact count is not required.
- Sum: Sum the values in the source column. This is only sensible for numeric source columns.
- Average: Average the values in the source column. This is only sensible for numeric source columns.
- Standard Deviation (sample): Calculates the sample standard deviation of the source column. This is only sensible for numeric source columns. - Standard Deviation (population): Calculates the population standard deviation of the source column. This is only sensible for numeric source columns.
- Variance (sample): Calculates the sample variance of the source column. This is only sensible for numeric source columns.
- Variance (population): Calculates the population variance of the source column. This is only sensible for numeric source columns. For more information on Aggregation functions in BigQuery, see the BigQuery documentation.
- Array Aggregate: Constructs an array from column data. See the BigQuery documentation for more information.
Name
= string
A human-readable name for the component.
Groupings
= dual listbox
Select one or more columns from the source table that will form the groupings.
The output flow will contain one row for each distinct combination of values within the grouping columns.
Aggregations
= columns editor
- Source Columns: Select the input (source) column for the summary function (the summary function is the aggregation type).
- Aggregation Type: Users can add the same source column multiple times, each with a different aggregation type. The summary is calculated for each combination of the source column values. Available functions include:
- Approximate Count
- Average
- Count
- Count Distinct
- Max
- Min
- Standard Deviation - Sample
- Standard Deviation - Population
- Sum
- Variance - Sample
- Variance - Population
- Average Distinct
- Standard Deviation Distinct
- Standard Deviation Population Distinct
- Sum Distinct
- Variance Distinct
- Variance Population Distinct
For more information on each available function, click here.
Strategy
The Aggregate component generates summary (aggregate) functions along with a group-by clause.
Video
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ |