Keeps rows that match a set of rules
Filtering is a core feature of Parabola. This step can be used to define what rows you want to keep.
You can use this step to create simple filters with one rule, or complex filters with many rules. All rules defined in a step will be applied with with a logical OR or a logical AND combining them. Default is logical OR and you can switch by using the checkbox at the top of the settings panel.
logical OR means that if ANY rule is evaluated to be true for a row, it is kept. One rule out of 5 could be true, and the row will be kept, or all 5 could be true.
logical AND means that a row will only be kept if ALL rules evaluate as true on that row.
The following conditions are available:
Rules are not case sensitive by default, but can be opted to be case sensitive by clicking on the casing option in the value field.
If you need to only keep values in a column that are between two values, let's say between 5 and 10, then you would just use two rules, combined with an AND.
You would set one rule to keep all rows that are greater than 5. Then you would create a rule to keep all rules that are less than 10. If you need to include values that sit on the 5 or the 10, then use less than or equals to, or greater or equals to.
There are only two date formats that will allow you to filter dates. One is called Unix format and the other is called Lexicographical format.
This is the time format that computers use. It's always in UTC or GMT, and represents either the number of seconds or the number of milliseconds that the date/time is from Jan 1 1970. If you are using a Date Formatter step to get your date into this format, use X or precision to the second and x for precision to the millisecond.
If you have your dates in Unix, and want to know which rows have dates that are before Jan 1 2018, then you would use a filter and tell it to keep all rows that are < (less than) 1514811600 - that timestamp is Jan 1 2018 in Unix time, and the row filter is keeping values less than that number, which would be earlier dates.
This is a fancy word for largest to smallest, or descending order dates. In this format, put the largest date/time concepts on the left, and the smallest on the right. So your date would look like this: 2019-10-24 11:24:00
In this format, you can have any sort of character between each grouping - could be a dash, a space, a colon - whatever! As long as every date in your column is using the character between the year and the month, or between the minute and second, it will work.
Once you have your dates in this format, you can filter in the same way as you would think to filter anything - just make sure that the date you are filtering against in your settings is in the same format.
If you set up a filter with multiple rules, and then think to yourself, "I wish I could see the data set that is being removed by this filter", then follow these steps.
In order to reverse your filter logic, you need to reverse everything.
It may feel weird to do this, or not correct, but I promise this will work. Reverse every filter condition operator, and reverse how they are being joined.
When reversing inequality symbols (things like less than), you need to also reverse whether or not the equals option is attached to it. This is why the reverse of < or = (less than or equals) is > (greater than).
Many times in Parabola, an appropriate solution is to split your table into a few sub tables, process each differently, and then merge the sub tables back into a complete table. In order to do that, you would pass your data into a set of row filters. One row filter per sub table. At the end, you may want to merge them back into a complete table, so ensuring that you don't lose any rows during the filtering process is key.
Symmetrical filtering is a technique to do this. The driving principle is that you should set up your rules across all needed row filters in such a way that they do not overlap with each other, and they do not miss any rows.
For example, let's say you have time-series data that spans 3 months (October, November, and December), and you need to process each month differently. You would want 1 row filter per month.
These are rules you need to create a symmetrical filtering system:
Filter 1 Date < 2019-11-01
Filter 2 Date > or = 2019-11-01 Date < 2019-12-01
Filter 3 Date > or = 2019-12-01
With the above filters, you will capture all dates, and no filter will contain data represented in another filter.
You can always be sure that your filtering is symmetrical by using a Table Merge at the end to merge the data back together, and if the row count in that step is the same as the row count in the step that is sending data into the symmetrical filters, then you did it right.