RegEx> example using Lookahead and Lookbehind

Technical support and scripting issues

Moderators: JRL, Dorian (MJT support)

Post Reply
User avatar
jpuziano
Automation Wizard
Posts: 1085
Joined: Sat Oct 30, 2004 12:00 am

RegEx> example using Lookahead and Lookbehind

Post by jpuziano » Sun Jan 31, 2010 5:55 pm

Hi Everyone,

While on zazzle.com submitting an entry for the Macro Scheduler T-shirt design contest... I somehow ran across this RegEx mug: http://www.zazzle.com/regular_expressio ... 9810962553

I hovered the mouse over the picture of the mug and spotted a section called Lookahead and Lookbehind and that reminded me of a problem I had tried to solve using Macro Scheduler RegEx and EasyPatterns.

I had to extract codes from various types of input such as report output text, csv files, etc. The codes are numbers (strings of digits) never less than 2 digits and never more than 6 digits long. The RegEx pattern to match that is pretty easy:

Let>pattern=[0-9]{2,6}

[0-9] tells it to match a character in the range of 0-9 which is any digit
{2,6} tells it there must be a minimum of 2 in a row up to a maximum of 6 in a row

I could also have used:

Let>pattern=\d{2,6}

because \d is short for [0-9] but I prefer the latter...

The first problem for me was, what if it ran across a number larger than 6 digits like 1234567. The above simple RegEx was matching 123456 but I wanted to throw that value away because really the number was 1234567 which is not a valid code. I would have to additionally check if the character after the match was also a digit and throw the match away if it was. Anything is possible but who needs extra work... and why couldn't RegEx help here?

The EasyPatterns phrase I was trying to take advantage of was:
http://www.datamystic.com/easypatterns_reference.html wrote:[mustBeginWith(...) ...], [mustNotBeginWith(...) ...]

When a match is found, it must be/must not be preceded by what is in the brackets. The bracket contents are NOT included in the actual match. The bracket contents are limited to fixed length strings - so no '3+' etc are allowed. This must be the first part of your pattern.

[mustBeginWith( 'hello' or 'goodbye' ) 'fred']
There is also mustEndWith.

I thought this was great... I could stipulate that a certain character or string either must be or must not be... just before or just after my target match... and that character or string would not be part of the match.

This was frustrating though because I had several types of chars that might occur just before or after my target match... and I wanted to use RegEx patterns to define those but EasyPatterns was only allowing me to use a fixed length string.

I did not want to match the numbers if they were surrounded by certain other characters, for instance:

-10-
/11/
\12\
45letters

That is because on report output, dates are often written with these separators around them so these are not actually codes I want... but just fragments of a date on a report. Also, if any letters were right up against the code, those were also invalid.

However, the following codes are fine and should be matched as these might be pulled in from a .csv file:

123,
"4567",
,890,

So after seeing that mug, I Googled for "lookahead regex" and eventually ended up here: http://www.regular-expressions.info/lookaround.html

After a bit of experimentation, I had what I needed using ordinary RegEx:

Code: Select all

//Goal is to match only 123, 4567 and 890 in the following string:
Let>line=  123, "4567", -10-, /11/, \12\,890, 45letters

Let>pattern=(?<![-/\\0-9a-zA-Z])[0-9]{2,6}(?![-/\\0-9a-zA-Z])

RegEx>pattern,line,0,match_array,num,0
Stepping through with the Debugger, I can see that:

MATCH_ARRAY_1=123
MATCH_ARRAY_2=4567
MATCH_ARRAY_3=890

And that was exactly what I needed. Its sometimes strange how solutions present themselves... but I wanted to share this in case anyone else out there needed something like this... and hadn't yet discovered the power of Regular Expressions with lookahead and lookbehind.

Thanks again Marcus for adding RegEx to Macro Scheduler... there's just no "looking back" now! :lol:

Take care
jpuziano

Note: If anyone else on the planet would find the following useful...
[Open] PlayWav command that plays from embedded script data
...then please add your thoughts/support at the above post - :-)

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Sun Jan 31, 2010 7:26 pm

Thanks for a great explanation of the LookAhead/Back usage. I have used that RegEx site, http:\\www.regular-expressions.info for a few years, excellent documentation and samples.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

Post Reply
cron
Sign up to our newsletter for free automation tips, tricks & discounts