tldr/pages/common/tesseract.md

24 lines
623 B
Markdown
Raw Normal View History

2017-02-24 07:39:35 +00:00
# tesseract
> OCR (Optical Character Recognition) engine.
- Recognize text in an image and save it to `output.txt` (the '.txt' extension is added automatically):
2017-02-24 07:39:35 +00:00
`tesseract {{image.png}} {{output}}`
- Specify a custom language (default is English) with an ISO 639-2 code (e.g. deu = Deutsch = German):
`tesseract -l deu {{image.png}} {{output}}`
- List the ISO 639-2 codes of available languages:
`tesseract --list-langs`
- Specify a custom page segmentation mode (default is 3):
`tesseract -psm {{0_to_10}} {{image.png}} {{output}}`
- List page segmentation modes and their descriptions:
`tesseract --help-psm`