Extract words from data that has unwanted characters using RegEx
Extracting a series of characters is a powerful data cleaning task used to extract usable information from difficult to process source data. When the data followed specific patterns, as things like automatic logs commonly do, we can use Regular Expressions to create flexible statements which process the data. In Excel, you can use RegexExtract.
Using the Find/Replace Object gives us the option to activate RegEx by clicking the asterisk (*) symbol so that it turns blue. Now, the Find and Replace fields can accept plain text or RegEx statements.
To prepare the data, we split the source column by a dash to isolate the important data points. In one column, we remove non-numerical characters by finding [a-z,A-Z,-] and replacing it with nothing. Then from the name data, we remove any numbers from the column and replace them with nothing. In a separate rule, we replace the dash with a space, resulting in more usable name data.