December 12, 2007

Capturing Screen Text

Filed under: Automation,Scripting — Marcus Tettmar @ 10:37 am

As I have mentioned previously, Macro Scheduler 10 introduces some powerful new commands for capturing screen text. In this post I aim to explain what kinds of text can be captured with these new commands and why there will always be some text that cannot be retrieved.

First let’s look at how the existing functions, GetWindowText and GetObjectText work in Macro Scheduler 9.x and below.

Open up Macro Scheduler and click on the Tools menu and then the “View System Windows” option. You’ll end up with a window that looks something like this:


What we are looking at is a tree representation of windows open on the system. In the above screen shot the highlighted line is showing us an object of class “Button” with caption “Test Center”. Each line gives us the current handle of the object, followed by its class name and then its caption text, if any.

This caption text belonging to an object is made available to other processes – it is published if you will. An app can simply ask the control for its text by sending a simple message to it. That is what Macro Scheduler is doing when it builds this list of windows and objects. Macro Scheduler enumerates all top level windows and then for each one enumerates each of its child “windows”. Note that I use the term window interchangeably with object or control here – the controls that appear in the list are “windowed controls” – they have window handles. A handle allows us to interact with the control. If we know its handle we can send a message to it saying “please give me your text”. And so we get the text of the control back. This is what GetObjectText and GetWindowText do.

There are a number of shortfalls to this approach. One is that the “caption text” that the object publishes is not always the text that you see on the screen. In the case of standard Buttons, Edits, Windows and Checkboxes the published text is usually what you see. But other objects don’t necessarily work the same way. We usually know where we are with common controls – ones that belong to Windows, but custom controls in third party software may not follow the same rules. And a treeview’s caption, for example, is not the text belonging to all its nodes which is written to the screen. Furthermore not all text belongs to windowed controls. In Delphi applications, a control class called TLabel is commonly used as a way to write text on a window. These are often used to label other controls like edit boxes. But TLabels are not windowed controls – they don’t have handles. So this technique will not be able to retrieve their text.

We also can’t use this approach to get text from the likes of Word documents or Internet Explorer pages. This text is not just some simple caption property belonging to an ordinary control – it is created in a more direct way.

When Windows writes text to the screen it uses one of a number of functions deep within the Windows API. Most Windows applications will trigger these functions whether or not the programmer realises it. One such function is TextOut:

Windows GDI – TextOut
The TextOut function writes a character string at the specified location, using the currently selected font, background color, and text color.

Note that this function is part of gdi32 which is responsible for graphics – GDI = Graphics Device Interface. So TextOut is being called to “paint” a character to the screen.

With Macro Scheduler 10, when you call one of the new text capture commands Macro Scheduler uses a “hook” to listen in to calls to TextOut and other similar functions. It is therefore able to intercept what is written to the screen and retrieve the text output by a window.

This works with all kinds of applications including Microsoft Office, Internet Explorer, Firefox and the vast majority of everything else. There are still some exceptions though. Remember that this works by hooking these low level functions within Windows that are used to create text. The vast majority of Windows applications will use these system calls (often indirectly). However, some software may not. There’s no reason why a programmer can’t write text in an even lower level way – he might decide to paint a word pixel by pixel.

As an example – Java applications written with the AWT or SWT frameworks write text using Windows API functions. So we can detect text from those. But if you have a Java app produced with the Swing libraries, which handle text output their own way, you’re not going to be able to capture the text from it.

Finally, what about text on images? Well, text on an image was already there. It was painted by the artist. It is set in stone. So text that appears in a jpg, bmp or any other image file, cannot be detected with the new text capture commands, because it isn’t produced on the fly by one of Windows own text output functions.

The best way to determine whether or not the text you are seeing can be captured with the new text capture commands is to fire up the “Text Capture” sample macro. This will show you the text beneath your mouse cursor. So move the mouse over the text you are interested in and see if it gets displayed in the dialog. If it is, you know you can use the text capture commands to retrieve this text in your macro.

The only way to detect text that cannot be detected with the text capture commands is via OCR. Two methods to do this in Macro Scheduler are discussed here and here.