{"id":2721,"date":"2016-08-05T15:37:25","date_gmt":"2016-08-05T15:37:25","guid":{"rendered":"https:\/\/www.mjtnet.com\/blog\/?p=2721"},"modified":"2016-08-05T15:37:25","modified_gmt":"2016-08-05T15:37:25","slug":"capture-screen-text-using-ocr","status":"publish","type":"post","link":"https:\/\/www.mjtnet.com\/blog\/2016\/08\/05\/capture-screen-text-using-ocr\/","title":{"rendered":"Capture Screen Text using OCR"},"content":{"rendered":"<p>Here&#8217;s a way to get screen text from any application &#8211; even from an image &#8211; using OCR and a free open source tool called Tesseract.<\/p>\n<p>First, you need to download and install Tesseract. <a href=\"https:\/\/github.com\/tesseract-ocr\/tesseract\/wiki\/Downloads\">You can get it here<\/a>.<\/p>\n<p>Tesseract is a command line utility.  The most basic syntax is:<\/p>\n<blockquote><p>tesseract.exe input_image_file output_text_file<\/p><\/blockquote>\n<p>So you could call it from a Macro Scheduler script something like this:<\/p>\n<pre class=\"brush:macroscript\">\/\/Capture screen to bmp file - you could instead capture only a window or use FindObject to get coordinates of a specific object\r\nGetScreenRes>X2,Y2\r\nScreenCapture>0,0,X2,Y2,%SCRIPT_DIR%\\screen.bmp\r\n\r\n\/\/run tesseract on the screen grab and output to temporary file\r\nLet>RP_WAIT=1\r\nLet>RP_WINDOWMODE=0\r\nRun>\"C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe\" \"%SCRIPT_DIR%\\screen.bmp\" \"%SCRIPT_DIR%\\tmp\"\r\n\r\n\/\/read temporary file into memory and delete it\r\nReadFile>%SCRIPT_DIR%\\tmp.txt,theText\r\nDeleteFile>%SCRIPT_DIR%\\tmp.txt\r\n\r\n\/\/Display the text in a message box \r\nMessageModal>theText<\/pre>\n<p>This example simply captures the entire screen.  You probably wouldn&#8217;t normally want to do this.  Instead you could capture a specific window:<\/p>\n<pre class=\"brush:macroscript\">\r\n\/\/Capture just the Notepad Window\r\nSetFocus>Untitled - Notepad\r\nGetWindowPos>Untitled - Notepad,X1,Y1\r\nGetWindowSize>Untitled - Notepad,w,h\r\nScreenCapture>X1,Y1,{%X1%+%w%},{%Y1%+%h%},%SCRIPT_DIR%\\screen.bmp<\/pre>\n<p>Or even a specific object:<\/p>\n<pre class=\"brush:macroscript\">\/\/capture just the editor portion of notepad ... \r\nSetFocus>Untitled - Notepad\r\nGetWindowHandle>Untitled - Notepad,hWndParent\r\nFindObject>hWndParent,Edit,,1,hWnd,X1,Y1,X2,Y2,result\r\nScreenCapture>X1,Y1,X2,Y2,%SCRIPT_DIR%\\screen.bmp<\/pre>\n<p>Either way you then have a screen bitmap you can pass into Tesseract.<\/p>\n<p>Once you&#8217;ve retrieved the text you would probably want to parse it, using e.g. <a href=\"https:\/\/www.mjtnet.com\/manual\/regex.htm\">RegEx<\/a>.  <a href=\"http:\/\/help.mjtnet.com\/article\/12-my-most-used-regex\">Here&#8217;s an article on a RegEx expression useful for parsing out data<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s a way to get screen text from any application &#8211; even from an image &#8211; using OCR and a free open source tool called Tesseract. First, you need to download and install Tesseract. You can get it here. Tesseract is a command line utility. The most basic syntax is: tesseract.exe input_image_file output_text_file So you [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,5,6],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/posts\/2721"}],"collection":[{"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/comments?post=2721"}],"version-history":[{"count":3,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/posts\/2721\/revisions"}],"predecessor-version":[{"id":2724,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/posts\/2721\/revisions\/2724"}],"wp:attachment":[{"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/media?parent=2721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/categories?post=2721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mjtnet.com\/blog\/wp-json\/wp\/v2\/tags?post=2721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}