Removes rows with a duplicate value in a given column
The base functionality of this step is easy to grasp - remove duplicates. To do that, you need to specify the column that should be used to determine if any rows are duplicates of each other. If you need to use multiple rows, try using a Combine columns step first to create a column that contains many values, and dedupe on that.
By default, the step keeps 1 of every unique value from the key column found (including blanks!). But you can elect to keep more duplicates if needed.
Okay, you know your data has duplicates in a certain column, but you actually want to keep some of those other values. You may just need them to be in one row. Click the option to Merge Duplicates. Now, you can select which columns to merge together during the deduping process.
For example, if you had data with Email, Name, and Company data, and you wanted to create a table of data that represented one row per company, but had all emails and names included in a list, you could use this feature. You would dedupe on the Company column, and then Merge the Name and Email columns, using a comma as the delimiter. Useful for creating mail merge lists with threads.
A common task is to take a list of data and keep the first 1 or 2 or 3 or N number of entries, depending on a certain order. For example, what if you needed to take data that represented customers attending your webinar, and only keep the first 20 participants per webinar, so that you would know who registered early.
Your data may have columns for Webinar ID, Time Registered, and Customer Email. In this case, you would want to use a Sort rows step to sort rows by Time Registered so that the earliest datetime stamp was at the top of the table. Then, you can use the Remove duplicate rows step to dedupe of the Webinar ID column, and keep 20 duplicates. This will leave you with the first 20 customers who registered for each webinar.