{"id":2865,"date":"2018-03-23T21:55:33","date_gmt":"2018-03-23T21:55:33","guid":{"rendered":"https:\/\/www.mjtnet.com\/blog\/?p=2865"},"modified":"2018-04-03T11:40:44","modified_gmt":"2018-04-03T11:40:44","slug":"screen-scraping-with-macro-scheduler-update","status":"publish","type":"post","link":"https:\/\/www.mjtnet.com\/blog\/2018\/03\/23\/screen-scraping-with-macro-scheduler-update\/","title":{"rendered":"Screen Scraping with Macro Scheduler [Update]"},"content":{"rendered":"<h2>What is Screen Scraping?<\/h2>\n<p>Screen Scraping is a term used to describe the process of a computer program or macro extracting data from the display output of another application. Rather than parsing data from the database or data files belonging to an application, Screen Scraping pulls the data from the screen itself, extracting data that was intended to be displayed to the end-user as opposed to data designed for output to another application or database. \u00a0<\/p>\n<p>Screen Scraping is necessary when there is a need to access the information displayed by the application but there is no method provided to access it behind the scenes. The database or data files may not be accessible, or may be undocumented or proprietary and therefore cannot be parsed easily; the costs associated with interacting with the database may be too high; or the license agreement or warranty prohibits it. In the case of legacy systems that are no longer supported there may be no knowledge of the data structures, or the technology used is no longer compatible with current technology. In these cases we are resorted to extracting the data from the screen &#8211; from the windows of the application.<\/p>\n<p>The term Screen Scraping probably originates from the era of computer terminals when you could connect the terminal output of a computer to an input port on another and therefore record the screen data. \u00a0<\/p>\n<h2>Screen Scraping Methods<\/h2>\n<p>There are a number of ways we can retrieve information from the screen using <a href=\"http:\/\/www.mjtnet.com\/macro_scheduler.htm\">Macro Scheduler<\/a>, depending on the type of application the data is in.<\/p>\n<h2>Screen Scraping Web Applications<\/h2>\n<p>Applications like Macro Scheduler&#8217;s <a href=\"http:\/\/www.mjtnet.com\/webrecorder.htm\">WebRecorder<\/a> can access the data and objects inside an Interner Explorer window and can therefore be used to extract the data. \u00a0Technically speaking I would not call this screen scraping since WebRecorder is using an API interface provided by Internet Explorer, but the process of extracting information from web sites is commonly refered to as Screen Scraping. \u00a0With <a href=\"http:\/\/www.mjtnet.com\/webrecorder.htm\">WebRecorder<\/a> we can use the ExtractTag wizard to create code that extracts the text from a particular element in the page. \u00a0\u00a0While WebRecorder is the easiest way to do it, it is also possible to automate IE and extract data from web pages by using VBScript.  The following forum posts may help:<\/p>\n<p><a href=\"http:\/\/www.mjtnet.com\/usergroup\/viewtopic.php?t=1511\">Automate Internet Explorer with OLE\/ActiveX<\/a><br \/>\n<a href=\"http:\/\/www.mjtnet.com\/usergroup\/viewtopic.php?p=6185\">Automate web forms with IE<\/a><br \/>\n<a href=\"http:\/\/www.mjtnet.com\/usergroup\/viewtopic.php?p=6585\">HTTP GET and POST using VBscript<\/a><\/p>\n<h2>Screen Scraping Microsoft Office Applications<\/h2>\n<p>Microsoft Office Applications, like Internet Explorer, have a COM interface that allows scripts to manipulate them and access the data held within them. \u00a0Again, not really scraping data from the screen itself, as you are getting it directly from a programming interface. There are a number of examples in the forums and blog archives and also some sample scripts that come with Macro Scheduler which demonstrate how to automate Office applications and retrieve data from them. \u00a0<\/p>\n<p><a href=\"http:\/\/www.mjtnet.com\/blog\/2007\/07\/02\/methods-for-accessing-excel-data\/\">Working with Excel<\/a><\/p>\n<h2>Screen Scraping Regular Windows Applications<\/h2>\n<p>Most other applications don&#8217;t offer a scripting interface like MS Office or Internet Explorer. \u00a0This is where we really need to work directly with the screen. \u00a0 There are a number of ways we can do this kind of Screen Scraping with Macro Scheduler.<\/p>\n<h2>Screen Scraping via Optical Character Recognition<\/h2>\n<p>Macro Scheduler 14.4 includes some really neat functions which make it really easy to OCR a portion of the screen:<\/p>\n<ul>\n<li>OCRScreen<\/li>\n<li>OCRWindow<\/li>\n<li>OCRArea<\/li>\n<\/ul>\n<p>The first of these is the simplest. It simply scans the entire screen and returns all the text it can recognise. Of course this is also the slowest as it has to perform OCR against the entire screen. OCRWindow takes a window title and scans only the area of the screen where that window appears. This is nice and simple and a good compromise if the window isn&#8217;t too large. Finally, OCRArea can be given a rectangular screen region (X1,Y1,X2,Y2). You could use FindObject to find the coordinates of a specific UI object and pass those coordinates to OCRArea if you want to narrow things right down.<\/p>\n<h2>The Text Capture Functions<\/h2>\n<p>Macro Scheduler includes some <a href=\"http:\/\/www.mjtnet.com\/blog\/2007\/12\/12\/capturing-screen-text\/\">Text Capture<\/a> functions which can be used to extract text from a given window, rectangular screen area or screen point. \u00a0These functions use low level system hooks which monitor applications calling the various &#8220;TextOut&#8221; functions that Windows uses to output text to the screen. \u00a0By doing so they are able to capture this text. \u00a0The Text Capture functions return the text to \u00a0a variable which you can then use as needed. \u00a0<\/p>\n<p>However, a few applications don&#8217;t use the Windows built-in functions to create and output text. \u00a0Don&#8217;t worry &#8211;<strong>Most do<\/strong>, but a few use their own techniques. \u00a0When you realise that text on the screen is just a sequence of small dots, if the application programmer decided to build his own routine to assemble text from dots rather than calling the Windows functions which already do that for you, you&#8217;re not going to be able to capture it.  <\/p>\n<p>The text capture functions and their limitations are <a href=\"http:\/\/www.mjtnet.com\/blog\/2007\/12\/12\/capturing-screen-text\/\">explained here<\/a>. \u00a0There is an <a href=\"http:\/\/www.mjtnet.com\/blog\/2008\/01\/03\/screen-scrape-text-capture-example\/\">example application, here,<\/a> created with Macro Scheduler, which you can use to determine whether or not the text you want to capture can be captured using the text capture functions.<\/p>\n<p><a href=\"http:\/\/www.mjtnet.com\/blog\/2007\/12\/12\/capturing-screen-text\/\">http:\/\/www.mjtnet.com\/blog\/2007\/12\/12\/capturing-screen-text\/<br \/>\n<\/a><a href=\"http:\/\/www.mjtnet.com\/blog\/2008\/01\/03\/screen-scrape-text-capture-example\/\">http:\/\/www.mjtnet.com\/blog\/2008\/01\/03\/screen-scrape-text-capture-example\/<\/a><\/p>\n<h2>Using the Clipboard for Screen Scraping<\/h2>\n<p>If the text you want to capture is selectable then you can use the clipboard to retrieve it. \u00a0A Macro Scheduler macro can send the keystrokes necessary to highlight and copy the text to the clipboard and then use the GetClipboard function to retrieve that text to a variable. \u00a0This is far less elegant than using the Text Capture functions but might be necessary if the application concerned is not utlising any of the Windows text out functions to create the text.<\/p>\n<pre name=\"code\" class=\"macroscript\">SetFocus>Notepad*\r\n\/\/Select ALL\r\nPress CTRL\r\nSend>a\r\nRelease CTRL\r\n\/\/Copy to clipboard\r\nPress CTRL\r\nSend>c\r\nRelease CTRL\r\n\r\n\/\/Get and display the data\r\nWaitClipboard\r\nGetClipboard>theData\r\nMessageModal>theData<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>What is Screen Scraping? Screen Scraping is a term used to describe the process of a computer program or macro extracting data from the display output of another application. Rather than parsing data from the database or data files belonging to an application, Screen Scraping pulls the data from the screen itself, extracting data that [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/posts\/2865"}],"collection":[{"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/comments?post=2865"}],"version-history":[{"count":3,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/posts\/2865\/revisions"}],"predecessor-version":[{"id":2910,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/posts\/2865\/revisions\/2910"}],"wp:attachment":[{"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/media?parent=2865"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/categories?post=2865"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/tags?post=2865"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}