OCR language

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
Enricoys
Newbie
Posts: 19
Joined: Mon Jan 06, 2014 1:18 pm

OCR language

Post by Enricoys » Tue Feb 16, 2021 12:50 pm

Is it possible to use the OCR functionality for another language. In the tessdata folder the English 'eng.traineddata' file is installed. At Github I downloaded a few files for other languages however how can I instruct MS to use those? The OCR_LANGDIR settings only refers to the folder and therefore not to the language. I renamed another language file to 'eng.traineddata' which seems to work (no errors), but I would assume there is a more gentle way.
Anyway, I very much enjoy this software. Thanks.

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 885
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: OCR language

Post by Dorian (MJT support) » Tue Feb 16, 2021 1:55 pm

You should just need to drop the training files into the Macro Scheduler/tessdata folder.

EDIT : I have asked for clarification on this. We were recently asked this question in support and the solution was the answer I gave above. Further testing is reveling this to be problematic (for me, at least).
Yes, we have a Custom Scripting Service. Message me or go here

Enricoys
Newbie
Posts: 19
Joined: Mon Jan 06, 2014 1:18 pm

Re: OCR language

Post by Enricoys » Tue Feb 16, 2021 4:05 pm

Thanks Dorian for your quick reply. Do I understand you right that OCR actually use always all the language (training) files found in the tessdata folder? If yes, that would suit perfect for me because OCR would in this way also work for images with mixed languages. Great!

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 885
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: OCR language

Post by Dorian (MJT support) » Tue Feb 16, 2021 5:48 pm

As soon as I have more answers I will post them here. The procedure was not what I was expecting it to be, so I need to gain some clarification before I can advise you.
Yes, we have a Custom Scripting Service. Message me or go here

Enricoys
Newbie
Posts: 19
Joined: Mon Jan 06, 2014 1:18 pm

Re: OCR language

Post by Enricoys » Thu Feb 18, 2021 9:42 am

Thanks Dorian. As a work around I now rename the trained data from another language to 'eng.traineddata' into the tessdata folder. This works fine and gives indead better results for that language. Mixed language is not possible in this way, however that is not a big deal.

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 885
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: OCR language

Post by Dorian (MJT support) » Thu Feb 18, 2021 10:22 am

You could possibly rename the appropriate files using Macro Scheduler. Then they could be renamed dynamically before each use, and renamed back afterwards. However I'm not sure if Macro Scheduler needs to be restarted between each change. If you try this, please let us know. The other possibly fly in the ointment might be if renaming those files requires elevated permissions, in which case Macro Scheduler would too.
Yes, we have a Custom Scripting Service. Message me or go here

Enricoys
Newbie
Posts: 19
Joined: Mon Jan 06, 2014 1:18 pm

Re: OCR language

Post by Enricoys » Thu Feb 18, 2021 11:13 am

I actually use the Macro Scheduler CopyFile function. I tried RenameFile now, however that didn't work (CF_Result_code=124). Probably indeed because it is the 'Program Files (x86)' folder. Anyway, no problem. DeleteFile and CopyFile work fine in this folder. I don't need to restart Macro Scheduler to make it work. Problem solved for me, but of course it would have been nicer if I could control this by a setting (for instance for English: OCR_LANG=eng and for Dutch: OCR_LANG=nld).

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 885
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: OCR language

Post by Dorian (MJT support) » Thu Feb 18, 2021 11:28 am

Good to know. Many thanks for the feedback. :)
Yes, we have a Custom Scripting Service. Message me or go here

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 885
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: OCR language

Post by Dorian (MJT support) » Fri Feb 19, 2021 10:25 am

We have asked the component makers about this. As soon as we have any clarification we'll reply here.
Yes, we have a Custom Scripting Service. Message me or go here

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 885
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: OCR language

Post by Dorian (MJT support) » Tue Feb 23, 2021 2:26 pm

New in v15.0.17, released today :
Version History.

Taken from the Helpfile:
To use anything other than the default English language file and specify a different language to use set the OCR_LANGCODE variable. This can be one or more languages. For multiple languages separate the language codes with a + symbol. E.g.:

Let>OCR_LANGCODE=jpn

Let>OCR_LANGCODE=eng+jpn

Macro Scheduler ships with the standard English language file. Other language files can be saved to the tessdata folder. Language files from these repositories are supported:

https://github.com/tesseract-ocr/tessdata_fast
https://github.com/tesseract-ocr/tessdata_best
Yes, we have a Custom Scripting Service. Message me or go here

Enricoys
Newbie
Posts: 19
Joined: Mon Jan 06, 2014 1:18 pm

Re: OCR language

Post by Enricoys » Tue Feb 23, 2021 4:38 pm

Great job! Works perfect now, especially with the 'tessdata_best' files.

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 885
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: OCR language

Post by Dorian (MJT support) » Tue Feb 23, 2021 4:58 pm

That's great to hear. Thank you for the feedback.
Yes, we have a Custom Scripting Service. Message me or go here

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts