June 6, 2006

Screen OCR – Recognising Graphical Text

Filed under: Automation,Scripting — Marcus Tettmar @ 10:26 am

[UPDATE: 27.3.2018 – Since this article was written Macro Scheduler now has built in Screen OCR functions]

We often need to write scripts that extract text from a window or some area of the screen. Usually this can be achieved via the clipboard, or commands such as GetWindowText and GetControlText, or via a Win32 API function. However, this only works with text that really is text, and with objects that expose text as a property of itself. For example, many Windows controls have a caption property which exposes the text associated with it. In this instance we can write code to capture the text by accessing that caption property.

But don’t forget that anything you see on the screen is really graphics. Text is rendered using fonts and fonts are just made up of a combination of coloured pixels. Not all text that you see on the screen is accessible via a simple object property. Sometimes the text you see is just pure graphics. Yes, it reads like text to us humans – we can make sense of it. But to the computer it is a sequence of dots. An arrangement of pixels. So how can we write a script to ‘read’ that text?

Well in these cases where the text is not exposed as a textual object, and is only represented graphically, the only way forward is to use Optical Character Recognition (OCR). There are many OCR packages available on the market which will read an image file, scan it for recognised characters and output ASCII text. So one approach would be to automate one of these OCR applications with Macro Scheduler. Macro Scheduler could first take a screen shot (using the ScreenCapture function) and then automate an OCR application, having it read the screenshot and output the text to a file.

But a less cumbersome approach is to use an excellent tool called Textract from Structurise. Textract is a library that will scan a window, screen area or image file, and output recognised text. Textract comes with a command line interface that makes it ideal for use within a Macro Scheduler script. Here is a small example script which scans the foreground window and retrieves all the text within it:

//Path to Textract
Let>TxPath=c:\\\textract

//Focus the App if necessary
//SetFocus>Notepad*

//Get bounds of active window
GetActiveWindow>title,x1,y1,w,h
Let>x2=x1+w
Let>y2=y1+h

//Set Run to run hidden and to wait for completion
Let>RP_WAIT=1
Let>RP_WINDOWMODE=0

//First time you run this on a new system uncomment next line
Run>%TxPath%\\\textra.exe /build

//Run Textra to retrieve all text in window.  Make sure we run hidden!
Run>%TxPath%\\\textra.exe /capture %x1% %y1% %x2% %y2% /ascii %TxPath%\\\tmpOutput.txt

//Read from output file into script variable and delete file
ReadFile>%TxPath%\\\tmpOutput.txt,TheText
DeleteFile>%TxPath%\\\tmpOutput.txt

//Just as an example display the text
MessageModal>TheText

This script simply displays all the text found in the entire window in a message box. You would probably want to parse the text in some way. Or you may just want text from a specific part of the window. You can do that by making the screen coordinates more specific and making the rectangle smaller. This is a really slick way of identifying text that isn’t otherwise visible as text to Macro Scheduler or Windows.

To get a list of options type textra.exe on the command line and see the documentation that comes with Textract.

You can download Textract from:
http://www.structurise.com/textract/