Question

我尝试在此图片上运行tesseract-ocr，但未成功：

> wget http://i.imgur.com/dOtlrvx.png
...
> convert dOtlrvx.png dOtlrvx.tif
> tesseract dOtlrvx.tif out -psm 10 && cat out.txt
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
.

识别的字符是一个点＆＃34;。＆＃34;

-psm 10代表＆＃34;将图像视为单个字符＆＃34;所以我认为它是正确的选择。我也尝试过其他psm可能的值，它也不起作用。

任何人都知道为什么这不起作用？欢迎提出任何建议！

由于

Answer 1

为tesseract创建新的配置文件，添加此行tessedit_char_whitelist 0123456789，然后处理您的图片：tesseract dOtlrvx.tif out -psm 10 your_config_file。

这对我有用。

tesseract ocr命令行为signe字符

1 个答案: