Skip to content

Aggregate

The Aggregate component generates summary (aggregate) functions along with a group-by clause.

The Aggregate component groups together multiple input rows into a single output row. Input columns can be added to the groupings or have an aggregation applied to them.

Default output names are chosen by combining the source column name and the aggregation type. If they are not appropriate, consider using a Rename component immediately after the Aggregate component to choose more appropriate names.


Properties

Name = string

A human-readable name for the component.


Groupings = dual listbox

Select one or more columns from the source table that will form the groupings. The output flow will contain one row for each distinct combination of values within the grouping columns.


Aggregations = column editor

Select the input (source) column for the summary function.


Grouping Type = drop-down

Select one of the following methods for grouping rows:

  • GROUP_BY: Groups rows with the same group-by-item expressions and computes aggregate functions for the resulting group. A Snowflake GROUP_BY expression can be a column name, a number referencing a position in the SELECT list, or a general expression.
  • GROUP_BY_CUBE: The Snowflake GROUP_BY_CUBE clause is an extension of the GROUP_BY clause similar to GROUP_BY_ROLLUP. In addition to producing all the rows of a GROUP_BY_ROLLUP, GROUP_BY_CUBE adds all the "cross-tabulations" rows. Subtotal rows are rows that further aggregate, whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.
  • GROUP_BY_GROUPING_SETS: A Snowflake GROUP_BY_GROUPING_SETS clause is a powerful extension of the GROUP_BY clause that allows computing multiple group-by clauses in a single statement. The group set is a set of dimension columns.
  • GROUP_BY_ROLLUP: A Snowflake GROUP_BY_ROLLUP clause is an extension of the GROUP_BY clause that produces subtotal rows (in addition to the grouped rows). Subtotal rows are rows that further aggregate, whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.

Name = string

A human-readable name for the component.


Groupings = dual listbox

Select one or more columns from the source table that will form the groupings. The output flow will contain one row for each distinct combination of values within the grouping columns.


Aggregations = column editor

  • Column Name: Select the input (source) column for the summary function.
  • Aggregate Type: The summary function to use for the column. Available functions include:
    • Approximate Count
    • Average
    • Count
    • Count Distinct
    • Max
    • Min
    • Standard Deviation - Sample
    • Standard Deviation - Population
    • Sum
    • Variance - Sample
    • Variance - Population
    • Sum Distinct
    • Skewness
    • Kurtosis

You can add the same source column multiple times, each with a different aggregate type. The summary is calculated for each combination of the source column values.

For more information on each available function, read this guide to SQL functions.

Name = string

A human-readable name for the component.


Groupings = dual listbox

Select one or more items from the Groupings column to confirm the Selected Groupings.


Aggregations = column editor

Select the Column Name you want to aggregate, and choose the type of aggregation you want to perform by selecting the Aggregation Type:

  • Approximate Count: This is a variant of Count Distinct that may have better performance in cases where an exact count is not required. For more information, read the Redshift documentation.
  • Average: Average the values in the column name. This is only sensible for numeric column names.
  • Count: Count the total number of records whose column name is not null.
  • Count Distinct: Count the total number of distinct (unique) values of the column name.
  • Max: Find the Maximum value of the column name.
  • Median: Calculates the median value for the range of values. For more information, read the Redshift Median documentation.
  • Min: Find the Minimum value of the column name.
  • Standard Deviation - Population: Calculates the population standard deviation of the column name. This is only sensible for numeric columns names.
  • Standard Deviation - Sample:** Calculates the sample standard deviation of the column name. This is only sensible for numeric columns names.
  • Sum: Sum the values in the column name. This is only sensible for numeric columns names.
  • Variance - Population: Calculates the population variance of the column name. This is only sensible for numeric columns names.
  • Variance - Sample: Calculates the sample variance of the column name. This is only sensible for numeric columns names.

For more information on the Aggregation Types, read the Redshift Aggregate Functions documentation.


Snowflake Databricks Amazon Redshift (preview)