Pivot
The Pivot transformation component lets you turn your rows of data into columns. This generally means reducing the number of rows and increasing the number of columns.
The Pivot column you choose contains the data that you want to aggregate. The Value column you choose contains labels for that data, which define the groups by which the data in the Pivot column can be aggregated. The Pivot Values you specify (values in the Value column) are used as the new column names in the output.
This component is equivalent to an SQL PIVOT function.
Use case
This component can be used to analyze data by aggregating values from a single dataset in different categories. For example, you can use it to:
- Calculate total sales per product using a dataset containing data from multiple individual sales.
- Monitor average product downtime per day using a dataset containing downtime data for multiple products over a longer period of time.
Properties
Name
= string
A human-readable name for the component.
Aggregate Function
= drop-down
Select which aggregate function to call to combine the grouped values from the Pivot Column. For more information, read the following:
Pivot Column
= drop-down
Specify the column from the source table or sub-query that will be aggregated. This requires an input component before configuring and uses the column names from the input component.
The data type expected in this column depends on the aggregate function you are depending to use. For example, you cannot average non-numeric values.
Value Column
= drop-down
Specify the column from the source table or sub-query that contains the values that will group data for aggregations and from which new column names will be generated. This requires an input component before configuring and uses the column names from the input component.
The chosen Value Column will be missing from the output, replaced by a new column for each of the Pivot Values specified.
Pivot Values
= multiple strings
A list of values that exist in the Value Column in the input data that are used to group data for aggregation. These values become new columns in the output, with data from the Pivot Column being aggregated into these new columns.
Quote Pivot Value Columns
= drop-down
Choose whether to enclose the new column names (the selected pivot values) in quote marks. This can avoid problems with the component if your pivot values contain spaces or special characters, if your pivot values are numeric data, or if your data is in mixed case.
If you use Snowflake, we recommend selecting Yes
because Snowflake has specific rules for column identifiers. Enclosing your pivot values in quotes helps to meet these rules.
Examples
Goal: Calculate total monthly sales for each product
We have the following table, containing data from individual sales of three different products (caps, T-shirts, and hoodies) in January, February, and March. Let's use the Pivot component to group the sales data by month and calculate the total sales for each month.
Input data
Product | Month | Sales |
---|---|---|
T-Shirt | Jan | 100 |
T-Shirt | Jan | 50 |
T-Shirt | Feb | 150 |
T-Shirt | Mar | 200 |
Hoodie | Jan | 200 |
Hoodie | Jan | 100 |
Hoodie | Mar | 250 |
Hoodie | Mar | 180 |
Cap | Jan | 60 |
Cap | Feb | 90 |
Cap | Mar | 120 |
Component configuration
- Aggregate Function: Sum
- Pivot Column: Month
- Value Column: Sales
- Pivot Values: Jan, Feb, Mar
Output data
Product | Jan | Feb | Mar |
---|---|---|---|
T-Shirt | 150 | 150 | 200 |
Hoodie | 300 | 430 | |
Cap | 60 | 90 | 120 |
Result: The pivot values Jan
, Feb
, and Mar
are used as the new column names. The values in the value column Sales
are aggregated using the Sum
function and grouped by the values in the pivot column Month
. There was no sales data for hoodies in February, so this value is null.
Goal: Calculate average student exam results by subject
We have the following table, containing different students' exam results for a range of subjects. Let's use the Pivot component to calculate each student's average score in each subject. The students are identified by their student ID number.
Input data
Student_ID | Subject | Score |
---|---|---|
S001 | Science | 85 |
S001 | Science | 90 |
S001 | History | 78 |
S002 | Geography | 88 |
S002 | History | 92 |
S002 | History | 86 |
Component configuration
- Aggregate Function: Average
- Pivot Column: Subject
- Value Column: Score
- Pivot Values: Science, History, Geography
Output data
Student_ID | Science | History | Geography |
---|---|---|---|
S001 | 87.5 | 78 | |
S002 | 89 | 88 |
Result: The pivot values Science
, History
, and Geography
are used as the new column names. The values in the value column Score
are aggregated using the Average
function and grouped by the values in the pivot column Subject
. There are two null values here, as student S001 has no score for Geography and S002 has no score for Science.