为什么会返回此错误?
root@amd-3700-2gb ~/ocr_test # tesseract -l dan pdf.png out pdf
Tesseract Open Source OCR Engine v3.03 with Leptonica
Error opening data file /usr/local/share/tessdata/osd.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load
root@amd-3700-2gb ~/ocr_test # tesseract --list-langs
List of available languages (3):
eng
dan
dan-frak
这样可以正常工作并将文本输出到out.txt
tesseract -l dan pdf.png out
这会创建out.pdf
,但也会重新提及所提及的错误,并且PDF中的可搜索文本没有意义
tesseract -l dan pdf.png out pdf
答案 0 :(得分:6)
错误消息很明确:它需要osd.traineddata
个文件。您可以安装或下载Orientation&来自https://github.com/tesseract-ocr/tessdata的Tesseract的脚本检测数据。