Tips on using the pivot columns and unpivot columns steps

Pivot columns

Takes exactly two columns, and pivots them into a dynamic number of columns.
One column is used to create the new headers, and the other column is used to fill in cell values.
During this operation, duplicate rows are removed.
This step tends to add columns and remove rows, but does not destroy data.
In excel, this might be expressed as transposing rows into columns.

When to use this step

Use this step when you have two columns that represent a hierarchical relationship, key value pairs, or something like “category” and “details”. If the data is unique based on two columns, this can be a way to reshape the data to be unique based on one column. Example For example, you may have a list that shows the number of employees per department per location. That means you will see the “Engineering” department show up in two rows - once for San Francisco, and once for New York. But if you want the data to be indexed only by department, showing each department on a single row, you can pivot the data. Set the “Column to pivot into new column headers” as the location column, and “Column with values to fill in” as the employee count column. This will ensure that each department is represented by a single row, and we have added columns for San Francisco and New York to represent the employee counts in those locations. If a department does not have any employees in New York, that department may have only had one row in the source data. As a result, their cell in the New York column will be blank. Potential issues This process assumes that the desired indexing of the table after the step is used is valid. In our example, it assumes that the only reason a department shows up on multiple rows is to express the employee count in multiple locations. Without the columns used in the pivot, the data would otherwise not be unique. However, if there is other data that makes each row unique, the outcome can be unexpected. For example, what if there is an additional column that indicates the name of the office manager in each location as well. New York’s office manager is “Cruet” and San Francisco’s office manager is “Truce”. If we pivot the location column and fill it in with the number of employees, we will still see multiple rows per department, even though new columns per location have been created. That is because the office manager is still unique per location, and that data must be preserved. To get around this, the columns must be removed ahead of time, using an Edit columns step, to ensure that multiple rows can be collapsed into a single row without loss of precision.

Unpivot columns

Requires 1 column to be used as a unique identifier, and takes any number of columns and converts them all into rows.
This results in the target columns coalescing into a set of two columns - one called “Type” that indicates the old column headers, and one called “Value” that indicates the cell values.
Blanks can be ignored optionally.
This step tends to remove columns and add rows, but does not destroy data (it can remove blanks, though).
In excel, this might be expressed as transposing columns into rows.
If you want to unpivot every column, choose “all columns except” and leave the column selector blank.

When to use this step

Use this step when you have many columns that represent attributes of your main index. The goal of using this step is to take many columns and consolidate them into a a set of two columns, creating many rows in the process. The data needs an unique identifier, but after that any number of (even all) columns can be unpivoted. If the data is already unique based on a single column, this step is a way to take values from non-primary columns and unpivot them down into additional rows, making the data unique based on two columns. Example For example, you may have a list of departments that shows the number of employees of that department in each office location, each location showing up as a column. So the Engineering department row has a column for New York and a column for San Francisco. If you want the data to instead show one row per department per location, you would unpivot the data. Set the “Unique identifier column” to the unique column in the data. If there is not a unique column, it may still work by selecting a column that is not going to be unpivoted. In this case, the department column. Select the columns to “Pivot”, which will be the columns that are converted into new rows. In this case, the San Francisco and New York column. If the columns are not always going to be the same, you can choose to pivot all columns except, and select the stable columns that you do not want to pivot. This will ensure that each combination of department and location is represented by a single row, and we have added rows for the employee counts in San Francisco and New York.

FAQs

Documentation Index

​Pivot columns

​When to use this step

​Unpivot columns

​When to use this step

Pivot columns

When to use this step

Unpivot columns

When to use this step