应用错误收集

我只想读取数字。由于tesseract 4.0不支持白名单，因此我下载了Shreeshrii的tessdata文件，该文件仅支持读取数字。 https://github.com/Shreeshrii/tessdata_shreetest

我将所有文件复制粘贴到Program Files（x86）/ Tesseract-OCR / tessdata

但是，当我在代码中尝试过

text = pytesseract.image_to_string（img，lang ='digit_comma'，config ='OEM_LSTM_ONLY'）

显示错误。

pytesseract.pytesseract.TesseractError：（1，“打开数据文件C” \ Program Files（x86）\ Tesseract-OCR \ digit.traineddata时出错，请确保将TESSDATA_PREFIX环境变量设置为“ tessdata”目录。失败）加载语言\'数字\'Tesseract无法加载任何语言！

我检查了这是正常的。

text = pytesseract.image_to_string（img，lang ='eng'，config ='OEM_LSTM_ONLY'）

，如果我在cmd上键入“ tesseract --list-langs”，它显示了Shreeshrii中所有添加的lang。

如何在代码中使用Shreeshrii的数据？

更新

我发现我的tesseract版本是4.0.0 beta，而Shreeshrii的数据仅适用于4.0.0版本。我将tesseract重新安装到4.0.0版本，并且可以正常工作。

如何在tesseract中添加新的lang并使用它？

0 个答案: