While on zazzle.com submitting an entry for the Macro Scheduler T-shirt design contest... I somehow ran across this RegEx mug: http://www.zazzle.com/regular_expressio ... 9810962553
I hovered the mouse over the picture of the mug and spotted a section called Lookahead and Lookbehind and that reminded me of a problem I had tried to solve using Macro Scheduler RegEx and EasyPatterns.
I had to extract codes from various types of input such as report output text, csv files, etc. The codes are numbers (strings of digits) never less than 2 digits and never more than 6 digits long. The RegEx pattern to match that is pretty easy:
Let>pattern=[0-9]{2,6}
[0-9] tells it to match a character in the range of 0-9 which is any digit
{2,6} tells it there must be a minimum of 2 in a row up to a maximum of 6 in a row
I could also have used:
Let>pattern=\d{2,6}
because \d is short for [0-9] but I prefer the latter...
The first problem for me was, what if it ran across a number larger than 6 digits like 1234567. The above simple RegEx was matching 123456 but I wanted to throw that value away because really the number was 1234567 which is not a valid code. I would have to additionally check if the character after the match was also a digit and throw the match away if it was. Anything is possible but who needs extra work... and why couldn't RegEx help here?
The EasyPatterns phrase I was trying to take advantage of was:
There is also mustEndWith.http://www.datamystic.com/easypatterns_reference.html wrote:[mustBeginWith(...) ...], [mustNotBeginWith(...) ...]
When a match is found, it must be/must not be preceded by what is in the brackets. The bracket contents are NOT included in the actual match. The bracket contents are limited to fixed length strings - so no '3+' etc are allowed. This must be the first part of your pattern.
[mustBeginWith( 'hello' or 'goodbye' ) 'fred']
I thought this was great... I could stipulate that a certain character or string either must be or must not be... just before or just after my target match... and that character or string would not be part of the match.
This was frustrating though because I had several types of chars that might occur just before or after my target match... and I wanted to use RegEx patterns to define those but EasyPatterns was only allowing me to use a fixed length string.
I did not want to match the numbers if they were surrounded by certain other characters, for instance:
-10-
/11/
\12\
45letters
That is because on report output, dates are often written with these separators around them so these are not actually codes I want... but just fragments of a date on a report. Also, if any letters were right up against the code, those were also invalid.
However, the following codes are fine and should be matched as these might be pulled in from a .csv file:
123,
"4567",
,890,
So after seeing that mug, I Googled for "lookahead regex" and eventually ended up here: http://www.regular-expressions.info/lookaround.html
After a bit of experimentation, I had what I needed using ordinary RegEx:
Code: Select all
//Goal is to match only 123, 4567 and 890 in the following string:
Let>line= 123, "4567", -10-, /11/, \12\,890, 45letters
Let>pattern=(?<![-/\\0-9a-zA-Z])[0-9]{2,6}(?![-/\\0-9a-zA-Z])
RegEx>pattern,line,0,match_array,num,0
MATCH_ARRAY_1=123
MATCH_ARRAY_2=4567
MATCH_ARRAY_3=890
And that was exactly what I needed. Its sometimes strange how solutions present themselves... but I wanted to share this in case anyone else out there needed something like this... and hadn't yet discovered the power of Regular Expressions with lookahead and lookbehind.
Thanks again Marcus for adding RegEx to Macro Scheduler... there's just no "looking back" now!

Take care