Popular lifehacks

How do I train a Tesseract for a specific font?

August 18, 2021 by Author

Table of Contents

1 How do I train a Tesseract for a specific font?
2 How do you prepare training data for Tesseract?
3 What is OCR font in computer?
4 What is Tesstrain sh?

How do I train a Tesseract for a specific font?

Training Tesseract The font has to be placed in the /fonts directory. The first step in the training process is to generate the training data. In our case, we will use tesstrain.sh script provided by tesseract to generate the training data. The above code will create training data and add it to the /train folder.

How do I add fonts to Tesseract?

tiff file you can set the font in which you have train tesseract. Either you can jTessBoxEditor for generating . traineddata or serak-tesseract-trainer is also there. I have used both and I would say that for generating tiff and box files jTessBoxEditor is great and for training tesseract use serak.

How do you prepare training data for Tesseract?

In general, the training step of Tesseract is :

Merge training data to . tiff file using jTessBoxEditor.
Create a training label, by creating a . box files containing predictions of the Tesseract from . tiff file and fix each inaccurate predictions.
Train the tesseract.

Can tesseract read handwriting?

Tesseract OCR doesn’t work well on handwritten texts. When passing the handwritten segment into Tesseract, we get very poor reading results. See below. For handwritten text, we will use Google Cloud Vision API to get better results.

What is OCR font in computer?

OCR-A is a font created in 1968, in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters.

Can you train Tesseract?

Luckily, you can train your Tesseract so it can read your font easily.

What is Tesstrain sh?

tesstrain.sh is a script that automatically calls the appropriate programs to create a new training for a language. It uses various programs for training, so you need to build them with ‘make training’ before using it. Not all files are required for LSTM training.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.