How to Use Image Recognition

Published on February 20, 2007 by Marcus Tettmar in Automation, Scripting

What is Image Recognition?

Update. Image Recognition has been vastly improved and simplified since this article was written and in v13 we introduced an Image Recognition Code Wizard. Watch this video to see how simple it is.

Image Recognition allows Macro Scheduler to find a bitmap (needle) in another, larger, bitmap (haystack) and return its position. The larger bitmap could be a snapshot of the screen, and the smaller bitmap could be a capture of a toolbar button or some other screen object. Therefore Image Recognition allows us to locate objects on the screen, graphically. We can also wait for images to appear on the screen. So Image Recognition allows us to automate any kind of graphical interface. We can make the script wait for images to appear on the screen, find screen objects and therefore send mouse events to the correct part of the screen to focus or control the objects.

The functions most used for Image Recognition are as follows:

WaitScreenImage
FindImagePos
GetScreenRes*
ScreenCapture*

*Note: GetScreenRes and ScreenCapture are no longer required (since v9.2) as you can now specify SCREEN for the haystack in FindImagePos

Building an Image Recognition Script

In this video I demonstrate the Image Recognition functions to automate Outlook running in a remote Citrix session. Traditionally automating interfaces in remote environments is difficult without having Macro Scheduler running on the remote server. Trying to automate a remote desktop with traditional commands is unreliable since the client doesn’t know anything about the windows and objects running on the remote server – all it sees is a graphical copy of the screen. This is why image recognition is so powerful in such situations.

Watch the video and then continue reading this article to learn how the script works.

In the video I first tell Macro Scheduler to find the position of a button on the screen. I first want the macro to find this button:

New Email Button

First I use the image capture tool to capture an image of this button from the screen and save it to a .bmp file. The image capture tool is available in the editor under the Tools menu, and also from the Capture button on the Command Builder dialog for the FindImagePos and other image recognition functions. I save the image to d:\citrix-images\new2.bmp.

The first thing I need my script to do is determine the dimensions of the screen. This is so that I can capture the entire screen later in the script. So I use the GetScreenRes function:

GetScreenRes

This returns the width and height of the screen in variables sX and sY. Next we capture the current screen to d:\screen.bmp using the ScreenCapture command:

ScreenCapture

So this captures the entire screen, from 0,0 to sX,sY returned by GetScreenRes.

Note: Since version 9.2 the above two steps are no longer necessary as it is now possible to specify SCREEN in FindImagePos and do not have to first copy the screen to a bitmap

Now we want the script to look for the new email button in this screen image using the FindImagePos function. This looks like this:

FindImagePos

This looks for new2.bmp inside screen.bmp using a color tolerance of 20 (where 0 is not tolerant and the pixel colors must match exactly and 255 is highly tolerant and anything would match!). We set the fourth parameter to 1 to tell it to return the center coordinates of the image match. If this was zero it would return the top left position in screen.bmp where a match was found. But I would like to return the coordinates of the center position of a match. i.e. the center of the new email button on the screen. XPos and YPos are our variables the coordinates will be stored in. These are arrays. So the first match will be in XPos_0,YPos_0, the second match (if any) in XPos_1, YPos_1, etc. imgs is the return variable which is set to the number of matches found.

So now we can say “if a match was found, move the mouse to the first position and click”. This looks like:

MouseMove

Since we captured the entire screen, the coordinates returned by FindImagePos map directly to screen coordinates. If the ScreenCapture command had not captured the entire screen we would have to add the top x,y coordinates given in the ScreenCapture command as offsets to the MouseMove command. As we captured the entire screen x,y was 0,0 anyway. Sometimes it is not necessary to work with the entire screen and we could instead capture just a Window. Like this:

MouseMove Relative

One would then have to add X,Y to any position returned by FindImagePos in order to map to an absolute screen position.

But back to the demo. We now have code which finds the new email button on the screen and clicks on it. This causes a new window to appear. If we were automating an application running on our desktop we could simply use the good old WaitWindowOpen command which is given the window title and waits for that window to appear. But, remember, this demo is automating a remote environment. The windows are on another computer. We don’t know about window titles. Citrix just sends us an image of the screen of the remote computer. So instead we will use the WaitScreenImage function. This is nice and easy:

WaitScreenImage

I used the capture tool to capture an image of the top left part of the new window – showing the window icon and title:

Window Title

That was saved to title2.bmp. So the above function simply watches the screen and waits until it finds this image on the screen. When that happens we know the new window has appeared.

By the way, you could create your own WaitScreenImage using a loop containing ScreenCapture and FindImagePos. This would give you more control and build in a timeout if required. But WaitScreenImage is a quick and easy way to wait for an image to appear on the screen without any further coding.

Finally, now the script knows the new window is present it can send some keystrokes, so we send some text to create the email. In the real world we’d probably use more image recognition to manipulate other objects on this new window and continue using the same techniques.

send keystrokes

Tips

For performance reasons the image recognition functions do not check every single pixel in each image. That would take far too long. Instead a random sample of pixels is taken from the small “needle” image and each of these is checked against all the possible matching pixels in the larger “haystack” image. For this reason it is possible to get false positives if the image you have captured is not very specific. E.g. consider these two images:

done

connecting

These images are largely similar since we have too much gray area. If the random sample is all from the gray area both would match. These are from a status bar, and if we were searching for Done we could easily find Connecting. The solution here is to make the image smaller so that there is less of the gray background. Alternatively find something more specific on the screen.

Note: You can override this behaviour by setting the number of pixels to scan using the FIP_SCANPIXELS variable, and since v12 this can be set to ALL to perform an exhaustive match and tests reveal no noticeable performance hit in doing so.

AppNavigator Does it Without Code

AppNavigator takes all the coding away from image recognition and also adds some more power. It is clever enough to narrow down a search if it first finds too many matches, or none at all, and it also remembers the part of the screen a match was found in. Next time it will concentrate on that part of the screen and only search the entire screen if a match is not found. This increases performance and reliability. With AppNavigator you can capture screen objects and assign them actions without having to write code like that presented here. AppNavigator can be told what to do when an object can’t be found, and can also navigate the extraction of data from Excel or Access without coding that either.

To see how easy it is to create the same process with AppNavigator see my next post.

Update 12th October 2007: Macro Scheduler 9.2 (and above) now lets you specify SCREEN in the haystack parameter of FindImagePos. Therefore if you are scanning the entire screen you no longer need to use GetScreenRes and ScreenCapture to capture the screen to a bitmap file first. Just specify SCREEN in FindImagePos.

« New Direct Access Beta | Remote Citrix Automation the Easy Way with AppNavigator »