Technical support and scripting issues
Moderators: JRL, Dorian (MJT support)
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Fri May 17, 2013 3:23 am
The regEx bug has been tormenting me for years. Now I just rephrase it again to augment everyone's understaning.
Referring to the following script:
Code: Select all
Let>Text=provocateur [ prō-ˌvä-kə-'tər ] pro•vo•ca•teur
Let>pattern=([ ])
Let>Replace=%SPACE%
RegEx>pattern,Text,0,matches,numMatches,1,Replace,Text1
MDL>%Text% => %Text1%
In essence, it does nothing, supposedly. In a nutshell, it replaces a space with a space.
The main issue should be immediately obvious to everyone: it transform "prˆt-ˌvä-kə-'tər" into "pr?-?va-k?-'t?r".
Marcus, would you please take a serious into the issue?
Thanks.
Last edited by
armsys on Fri May 17, 2013 8:29 am, edited 5 times in total.
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Fri May 17, 2013 8:23 am
To view this post correctly in Firefox, please use the following setting:
View > Character Encoding > Western (Windows-1252)
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Fri May 17, 2013 9:12 am
Marcus,
Thanks for your reply.
Surprisingly, the same script works perfectly with Chinese characters.
But it fails the English characters.
BTW, my Win 7 Ultimate's system locale (ie, non-Unicode programs) is set to: Chinese (Traditional, Hong Kong S.A.R.)
I don't think it has anything to do with the RegEx bug.
On the other hand, StringReplace> works perfectly with both English and Chinese.
Of course, I prefer RegEx because it's more powerful and flexible.
Thanks again for your help.
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Fri May 17, 2013 9:02 pm
Marcus,
Referring to:
http://www.regular-expressions.info/pcre.html
Compiling PCRE with Unicode Support
By default, PCRE compiles without Unicode support. If you try to use \p, \P or \X in your regular expressions, PCRE will complain it was compiled without Unicode support.
To compile PCRE with Unicode support, you need to define the SUPPORT_UTF8 and SUPPORT_UCP conditional defines. If PCRE's configuration script works on your system, you can easily do this by running ./configure --enable-unicode-properties before running make.
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Fri May 17, 2013 9:57 pm
PCRE appears to support:
Capturing parentheses: (⋯) \1 \2...
Why does MS not support \1 \2...?
Instead, it requires $1 $2...
Are MS' RegEx different than the standard PCRE flavor?
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Fri May 17, 2013 10:34 pm
EASYPATTERNS
Using EasyPatterns results into the same bug:
Let>Text=provocateur [ prō-ˌvä-kə-'tər ] pro•vo•ca•teur
Let>pattern=[space]
Let>Replace=%SPACE%
RegEx>pattern,Text,1,matches,numMatches,1,Replace,Text1
MDL>%Text% => %Text1%
Conclusion:
Both PCRE and EasyPatterns will transform and corrupt the original (source) text even though it's supposed to do NOTHING!
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Fri May 17, 2013 10:47 pm
Is there any alternative?
Can it be done with VBScript?
Please show us a VBscript sample.
Thanks for your help in advance.
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Fri May 17, 2013 11:32 pm
My RegEx VBScript works successfully:
VBSTART
Function SR(Srh,Rep,Source)
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.Pattern = Srh
SR = objRegEx.Replace(Source,Rep)
End Function
VBEND
Let>Text=provocateur [ prō-ˌvä-kə-'tər ] pro•vo•ca•teur
Let>pattern=%SPACE%
Let>Replace=%SPACE%
VBEval>SR("%Pattern%","%Replace%","%Text%"),Result
MDL>%Text% => %Result%
The above VBScript code doesn't corrupt the source text.
It shows the bug stems from within Macro Scheduler.
The bug has nothing to do with PCRE nor EasyPatterns.
That's, Macro Scheduler corrupts the source text.
Last edited by
armsys on Sat May 18, 2013 6:37 am, edited 1 time in total.
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Sat May 18, 2013 6:35 am
After thousand iterations of rewriting the script, I eventually dsicovered and confirmed another fatal bug first reported on Jan 6, 2005.
The FATAL error generated by MS/VBScript:
Microsoft VBScript compilation error:1033
Unterminated string constant
Line 7, Column 38
The error text above is misleading. It turns out that Macro Scheduler imposes a severe limitation on the strings being processed by VBScript. No string is allowed to contain CRLF (probably including quotes as well). In the real world programming, it's impossible and unreasonable.
Marcus, after 7 years, would you please take a serious look into the no-CRLF string limitation imposed upon the VBscript?
Asking Macro Scheduler programmers to remove the CRLF is unacceptable.
Is this limitation actually imposed by Delphi or VBScript?
All Macro Scheduler manuals have never mentioned such time-wasting limitation/bug.
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Sat May 18, 2013 7:13 am
I find SetFocus> support of support RegEx is extremely useful in a sense that it can shorten hundred lines into a single line.
For example,
Let>Pattern=^(.+\.DOC.*|Document\d+) - WORD$
Marcus, I'm impressed that the RegEx flavor used by SetFocus appears to be of Microsoft VBScript/VB.net. It is neither PCRE nor EasyPatterns. In fact, it isn't case sensitive, that surprises me too.
Can you confirm it?
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Sat May 18, 2013 7:18 am
Nonetheless, I discoer that Setfocus>%Pattern% takes significantly longer to locate the window than, say, SetFocus>Document1 - Word.
Can you optimize it? Thanks.
-
Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
-
Contact:
Post
by Marcus Tettmar » Mon May 20, 2013 2:49 pm
Erm, gosh, hard to keep up with you.
Apologies if I miss something:
1. I've already said we'll investigate the unicode issues with RegEx and see if there is a solution. Please be patient. This isn't going to happen overnight.
2. Re CRLF in VBScript strings. I dealt with that this morning. If you use a hard CRLF in string passed to vBScript VBScript will not like it, as it looks like two unterminated strings. So we need to replace the hard CRLFs with VBScript's vbCRLF placeholder:
StringReplace>string,CRLF,vbCRLF,string
I'm sure there are lots of examples like this on the forums.
The error is not misleading. Consider a string with a hard line break in it and imagine we have quoted it for VBScript. We would end up with:
"this is line one
this is line two"
Now, go and put that into some VBScript code. E.g. try this:
Dim mystring
mystring = "this is line one
this is line two"
Of course you'll get a syntax error. What is the error? Well there are two errors, the first line is an unterminated string, the next line also has an error - it is not valid vbscript.
So you can see how hard breaks cause unterminated string errors.
Regardless of whether you think this error is well stated or not, it is defined by Microsoft and there's nothing we at MJT Net can do about it. VBScript is a Microsoft technology. VBScript comes with your operating system. Macro Scheduler simply makes use of it. We can't change it. We can only use it.
Since it is syntactically different to the native language in Macro Scheduler we naturally have to do some conversions sometimes. E.g. strings don't need to be delimited in MacroScript, they do in VBScript and so on. We simply need to know what the rules are and work with them.
Hope this makes sense.
3. Using regex in window matching is bound to take longer. By definition as well as loop through the entire list of windows currently existing it also has to run the regex against each and every one until a match is found. I would EXPECT it take longer. The more complicated the pattern and the more windows available on the system and the furthest down the list the match is the longer it is going to take. No, there's not much we can do to optimize it.
4. EasyPatterns simply converts a more natural english expression into a RegEx expression. It then passes that expression to the same RegEx engine that RegEx uses. So you won't get any different results.
Again, as I promised earlier, we will investigate the unicode issue and see if there is anything we can do about it. I've promised you that and it is in our issue tracker and on the work list. At this point I can't promise you any more than that. It is in hand.
-
armsys
- Automation Wizard
- Posts: 1108
- Joined: Wed Dec 04, 2002 10:28 am
- Location: Hong Kong
Post
by armsys » Mon May 20, 2013 8:44 pm
Marcus Tettmar wrote:Apologies if I miss something:...
I'm the one who should apologize to you.
Marcus Tettmar wrote:"EasyPatterns simply converts a more natural english expression into a RegEx expression.
On the other hand, it involves time-wasting learning and memory. It's not doucmented other than
http://www.datamystic.com/easypatterns_reference.html. And the knowledge is not applicable to VBScript, C++, Java, Ruby,... It doesn't support full range of PCRE features. Why not go straight to PCRE? Whether it's redundant, I leave it to Macro Scheduler users for further debate.
Marcus Tettmar wrote:1. I've already said we'll investigate the unicode issues with RegEx and see if there is a solution.
Thanks, Marcus. I repeat: both PCRE and EasyPatterns corrupt unicode text. Somehow internally it does some transformation and translation and misinterpret (render) some unresolved characters as ? (question mark).
-
Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
-
Contact:
Post
by Marcus Tettmar » Mon May 20, 2013 8:55 pm
Why not go straight to PCRE? Whether it's redundant, I leave it to Macro Scheduler users for further debate.
Because for some things and for some people easypatterms is an awful lot simpler to understand than RegEx. It is NOT redundant, but you do not have to use it. Not everyone will consider it time-wasting. You might as well say that all the other functions you don't use are a waste of time. That's a bit unfair. Plenty of others find it useful.
Marcus Tettmar wrote:1. I've already said we'll investigate the unicode issues with RegEx and see if there is a solution.
Thanks, Marcus. I repeat: both PCRE and EasyPatterns corrupt unicode text.
You really don't need to repeat yourself! I understand. We are investigating. Believe me. We will investigate, seek to understand the cause, and assuming it is possible to do so we will fix it. Please, no need to repeat yourself. I am not ignoring you.