Question

我正在使用tesseract OCR with python-tesseract。在tesseract FAQ中，关于数字，我们有：

使用

TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");

在调用Init函数之前或将其放在一个名为的文本文件中   tessdata / CONFIGS /数字：

tessedit_char_whitelist 0123456789

然后您的命令行变为：

tesseract image.tif outputbase nobatch digits

警告：在旧的和新的配置变量合并之前，您必须这样做   也有nobatch参数。

在python-tesseract中，存在SetVariable方法。我试过这个，但是OCR的结果是一样的：

api = tesseract.TessBaseAPI()
api.SetVariable("tessedit_char_whitelist", "0123456789")
api.Init('.','eng',tesseract.OEM_DEFAULT)
api.SetPageSegMode(tesseract.PSM_AUTO)

有没有人已经开始工作，或者我认为它是python-tesseract中的错误？

Answer 1

好的，搞定了。根据tesseract-ocr的这个(unofficial ?) documentation，必须在Init（）之后调用SetVariable（），即使官方常见问题解答中说的相反。在Init（）之后调用它按预期工作。

python-tesseract OCR：仅获取数字

1 个答案: