How Do You Identify a Unicode File?

Technical support and scripting issues

Moderators: JRL, Dorian (MJT support)

Post Reply
User avatar
JRL
Automation Wizard
Posts: 3532
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

How Do You Identify a Unicode File?

Post by JRL » Mon Jul 02, 2012 8:38 pm

If we look at a Macro Scheduler script file using a hex editor we can see ASCII characters 255 and 254 at the start of the file then every other character after that is an ASCII 0 character. From the little I know, this is the normal pattern indicating the file is a Unicode file. We see the same pattern if we save a file to Unicode from notepad.

If we ReadFile> an executable file (or any other binary file) then WriteLn> that imported data to a new file (setting WLN_NOCRLF equal to 1), the newly written executable file works fine. Its binary integrity is maintained. So WriteLn> appears to be capable of dealing with all ASCII characters.

If, on the other hand, we ReadFile> a Macro Scheduler script file, then immediately WriteLn> that imported data to a new file, the Unicode format is gone, the file is changed.

Back in Sept. 2009 Marcus stated: "WriteLn writes ANSI files..." and... "we're working on adding an option to make WriteLn save Unicode."

From there:

Let>WLN_ENCODING=UNICODE

came to be.

Setting this variable to UNICODE allows WriteLn> to write a Unicode formatted file. Unfortunately if we ReadFile> then WriteLn> that aforementioned executable file while WLN_ENCODING is set to UNICODE the executable is converted to a Unicode format. Meaning the ASCII characters 255 254 are inserted at the beginning of the file and all other characters of the file become separated by ASCII character 0.

Is there a way to determine programmatically whether or not a file is Unicode so we know whether or not to set WLN_ENCODING to UNICODE?

User avatar
JRL
Automation Wizard
Posts: 3532
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Post by JRL » Tue Jul 10, 2012 7:17 pm

This seems to work with script files.

Code: Select all

Input>data,Select a script file

VBSTART
Function IsUnicode(filename)
   Const ForReading = 1
   Set objFSO = CreateObject("Scripting.FileSystemObject")
   Set objTextFile = objFSO.OpenTextFile(filename, ForReading)
   stringValue = objTextFile.Readline

   IsUnicode = False
   If Asc(Left(stringValue, 1)) <= 0 Or Asc(Left(stringValue, 1)) >= 239 Then
      IsUniCode = True
   End If
End Function
VBEND

VBEval>IsUnicode("%data%"),res

mdl>res

armsys
Automation Wizard
Posts: 1108
Joined: Wed Dec 04, 2002 10:28 am
Location: Hong Kong

Re: How Do You Identify a Unicode File?

Post by armsys » Wed Jun 05, 2013 9:51 pm

Hi JRL,
JRL wrote:If we ReadFile> an executable file (or any other binary file) then WriteLn> that imported data to a new file (setting WLN_NOCRLF equal to 1), the newly written executable file works fine. Its binary integrity is maintained. So WriteLn> appears to be capable of dealing with all ASCII characters.
It doesn't seem your original issue--writing back a non-Unicode executable back to disk--has been resolved, has it?

User avatar
JRL
Automation Wizard
Posts: 3532
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Post by JRL » Wed Jun 05, 2013 10:04 pm

Yes, as far as I am concerned the posted "IsUnicode" vbscript resolved my problem. The issue was simply knowing whether a file was unicode or not and the "IsUnicode" vbscript seems to accurately identify unicode files.

"--writing back a non-Unicode executable back to disk--" Was never a problem. You do need to set the Macro Scheduler variable "WLN_NOCRLF" equal to 1. This prevents the Writeln> function from adding the carriage return and line feed characters at the end of the file thus destroying the integrity of the file.

Hope this make sense.

armsys
Automation Wizard
Posts: 1108
Joined: Wed Dec 04, 2002 10:28 am
Location: Hong Kong

Post by armsys » Wed Jun 05, 2013 10:22 pm

JRL wrote:"--writing back a non-Unicode executable back to disk--" Was never a problem. You do need to set the Macro Scheduler variable "WLN_NOCRLF" equal to 1. This prevents the Writeln> function from adding the carriage return and line feed characters at the end of the file thus destroying the integrity of the file.
Hi JRL,
Thanks for your lightning fast reply.
Thanks for sharing the secret of the WLN_NOCRLF thing. I didn't know it previously.
Thanks a million.

User avatar
JRL
Automation Wizard
Posts: 3532
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Post by JRL » Wed Jun 05, 2013 10:28 pm

Thanks for sharing the secret of the WLN_NOCRLF thing
You're welcome. Without that feature you didn't stand a chance. But the thanks goes to Marcus for developing a world class software.

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts