MODI OCR: Improving ability to read text - insights

Technical support and scripting issues

Moderators: JRL, Dorian (MJT support)

Post Reply
gdyvig
Automation Wizard
Posts: 447
Joined: Fri Jun 27, 2008 7:57 pm
Location: Seattle, WA

MODI OCR: Improving ability to read text - insights

Post by gdyvig » Thu May 21, 2009 10:56 pm

No OCR package is 100% accurate or reliable. We use MODI OCR that comes with Microsoft Office. It is pretty good but has problems with contrasty graphics near the text and with lack of space around the text. Symtoms are inaccurate text retrieval or the inability to return any text at all.

In addition to the captured text MODI can return other information about the bmp or jpg file passed to it, such as the image size. I use it as a clue to determine whether there is a vbscript problem when no text is returned.

Often you want to capture the text in a very specific area of the screen and not have it confused with other text. So you use Image Recogntion to find a text field label, do a screencapture of the adjacent field, and use DoOCR to capture the text. This post will illustrate the same text in several scenarioes showing where MODI works and does not work.

Here is my DoOCR script:

Code: Select all

VBSTART
  Function DoOCR(bitmapfile)
  Dim miDoc
  Dim miLayout
  Dim stringOut
  Dim miImg
  Dim PixelWidth
  Dim PixelHeight
  on error resume next
  set miDoc=CreateObject("MODI.Document")
  miDoc.Create (bitmapfile)
  ' Perform OCR.
  'You can change the mousepointer here to an hourglass or something.
  miDoc.Images(0).OCR
  'Change the mouse back to normal default.
  set miLayout = miDoc.Images(0).Layout
  stringOut=miLayout.Text
  'MsgBox(stringOut)

  set miImg = miDoc.Images(0)
  PixelWidth=miImg.PixelWidth
  PixelHeight=miImg.PixelHeight
  DoOCR = PixelWidth & "," & PixelHeight & " " & stringOut
  Set miLayout = Nothing
  Set miDoc = Nothing
End Function
VBEND
//Need minimum 3 pixel boarder around text
Let>sImageFileBmpOrJPG=C:\MJT_Log\_testOCR.bmp

VBEval>DoOCR("%sImageFileBmpOrJPG%"),sImagePixelSizeAndTextFound
MDL>%sImagePixelSizeAndTextFound%
Note that it will return ,

In all of the images below the text is 8 pixels high. The differences are in the image height and size/color of the border.

Here is an image that contains only the screen text and no adjacent graphics or differently colored background:

TestOCR1 Readable 86x36pixels White area 17 pixels high
Image

TestOCR2: Not readable 86x17 pixels
Image

TestOCR3 Not readable 86x21 pixels
Image

TestOCR4 Not readable 86x21 pixels
Image

TestOCR5 Readable 86x36 pixels
Image

TestOCR6 Readable 86x36 pixels
Image

TestOCR7 Not readable 86x36 pixels
Image

Conclusions:

1. Include sufficient background pixels above and below the expected text.
2. OCR works best when contrast between background and border is low.
3. Choose light colored screen background and field highlighting.
4. Choose font color that contrasts with background.
5. Capture minimum size image and add low contrast border.
6. Manipulate image colors and intensities.
7. Third party image manipulation tool will help
8. Enhancing Macro Scheduler to include such a tool will help.

If you have additional insights, please add them to this thread.


Hope people find this useful,

Gale
Last edited by gdyvig on Thu May 21, 2009 11:10 pm, edited 3 times in total.

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Thu May 21, 2009 11:06 pm

I see these are .jpg's, are you using .jpg's or .bmp's (or both)?

gdyvig
Automation Wizard
Posts: 447
Joined: Fri Jun 27, 2008 7:57 pm
Location: Seattle, WA

Jpgs or bmps.

Post by gdyvig » Thu May 21, 2009 11:14 pm

In my actual scripts I usually use jpgs to keep file size down. I used bmp's for the images you see in this thread.

Gale

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Re: Jpgs or bmps.

Post by Me_again » Fri May 22, 2009 1:36 pm

gdyvig wrote:In my actual scripts I usually use jpgs to keep file size down. I used bmp's for the images you see in this thread.

Gale
Interesting. I don't use OCR but I would have expected the OCR to have problems with the image compression in .jpg's, I guess it's smarter than I thought.

Looking at your image links another internet resource you may find useful is tinyurl.com.

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts