November 11, 2015

Fix Unicode file with missing BOM

Filed under: Scripting — Marcus Tettmar @ 10:39 am

Macro Scheduler‘s ReadFile and ReadLn functions understand ANSI, UTF8 and Unicode files – as long as they have a valid BOM header.  But a client recently needed to read in a file with a missing BOM.  So we wrote a little bit of VBScript which reads the binary stream in, and then outputs a UTF8 encoded file.  

Here’s the code:

VBSTART
Sub UTFConvert(filename)
  Set fso = CreateObject("Scripting.FileSystemObject")
  txt = fso.OpenTextFile(filename, 1, False, -1).ReadAll
  Set stream = CreateObject("ADODB.Stream")
  stream.Open
  stream.Type     = 2 'text
  stream.Position = 0
  stream.Charset  = "utf-8"
  stream.WriteText txt
  stream.SaveToFile filename, 2
  stream.Close
End Sub
VBEND

//Convert it to UTF8
VBRun>UTFConvert,%SCRIPT_DIR%\data.txt

//Now we can read it :-)
ReadFile>%SCRIPT_DIR%\data.txt,theFileData