OCR language
Moderators: Dorian (MJT support), JRL
OCR language
Is it possible to use the OCR functionality for another language. In the tessdata folder the English 'eng.traineddata' file is installed. At Github I downloaded a few files for other languages however how can I instruct MS to use those? The OCR_LANGDIR settings only refers to the folder and therefore not to the language. I renamed another language file to 'eng.traineddata' which seems to work (no errors), but I would assume there is a more gentle way.
Anyway, I very much enjoy this software. Thanks.
Anyway, I very much enjoy this software. Thanks.
- Dorian (MJT support)
- Automation Wizard
- Posts: 1378
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: OCR language
You should just need to drop the training files into the Macro Scheduler/tessdata folder.
EDIT : I have asked for clarification on this. We were recently asked this question in support and the solution was the answer I gave above. Further testing is reveling this to be problematic (for me, at least).
EDIT : I have asked for clarification on this. We were recently asked this question in support and the solution was the answer I gave above. Further testing is reveling this to be problematic (for me, at least).
Yes, we have a Custom Scripting Service. Message me or go here
Re: OCR language
Thanks Dorian for your quick reply. Do I understand you right that OCR actually use always all the language (training) files found in the tessdata folder? If yes, that would suit perfect for me because OCR would in this way also work for images with mixed languages. Great!
- Dorian (MJT support)
- Automation Wizard
- Posts: 1378
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: OCR language
As soon as I have more answers I will post them here. The procedure was not what I was expecting it to be, so I need to gain some clarification before I can advise you.
Yes, we have a Custom Scripting Service. Message me or go here
Re: OCR language
Thanks Dorian. As a work around I now rename the trained data from another language to 'eng.traineddata' into the tessdata folder. This works fine and gives indead better results for that language. Mixed language is not possible in this way, however that is not a big deal.
- Dorian (MJT support)
- Automation Wizard
- Posts: 1378
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: OCR language
You could possibly rename the appropriate files using Macro Scheduler. Then they could be renamed dynamically before each use, and renamed back afterwards. However I'm not sure if Macro Scheduler needs to be restarted between each change. If you try this, please let us know. The other possibly fly in the ointment might be if renaming those files requires elevated permissions, in which case Macro Scheduler would too.
Yes, we have a Custom Scripting Service. Message me or go here
Re: OCR language
I actually use the Macro Scheduler CopyFile function. I tried RenameFile now, however that didn't work (CF_Result_code=124). Probably indeed because it is the 'Program Files (x86)' folder. Anyway, no problem. DeleteFile and CopyFile work fine in this folder. I don't need to restart Macro Scheduler to make it work. Problem solved for me, but of course it would have been nicer if I could control this by a setting (for instance for English: OCR_LANG=eng and for Dutch: OCR_LANG=nld).
- Dorian (MJT support)
- Automation Wizard
- Posts: 1378
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: OCR language
Good to know. Many thanks for the feedback.
Yes, we have a Custom Scripting Service. Message me or go here
- Dorian (MJT support)
- Automation Wizard
- Posts: 1378
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: OCR language
We have asked the component makers about this. As soon as we have any clarification we'll reply here.
Yes, we have a Custom Scripting Service. Message me or go here
- Dorian (MJT support)
- Automation Wizard
- Posts: 1378
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: OCR language
New in v15.0.17, released today :
Version History.
Taken from the Helpfile:
Version History.
Taken from the Helpfile:
To use anything other than the default English language file and specify a different language to use set the OCR_LANGCODE variable. This can be one or more languages. For multiple languages separate the language codes with a + symbol. E.g.:
Let>OCR_LANGCODE=jpn
Let>OCR_LANGCODE=eng+jpn
Macro Scheduler ships with the standard English language file. Other language files can be saved to the tessdata folder. Language files from these repositories are supported:
https://github.com/tesseract-ocr/tessdata_fast
https://github.com/tesseract-ocr/tessdata_best
Yes, we have a Custom Scripting Service. Message me or go here
Re: OCR language
Great job! Works perfect now, especially with the 'tessdata_best' files.
- Dorian (MJT support)
- Automation Wizard
- Posts: 1378
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: OCR language
That's great to hear. Thank you for the feedback.
Yes, we have a Custom Scripting Service. Message me or go here