Question

使用此工具http://trainyourtesseract.com/我希望能够在pytesseract中使用新字体。该工具给我一个名为* .traineddata

的文件

现在我正在使用这个简单的脚本：

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract as tes

results = tes.image_to_string(Image.open('./test.jpg'),boxes=True)
file = open('parsing.text','a')
file.write(results)
print(results)

如何使用我的训练数据文件，以便能够使用python脚本读取新字体？

谢谢！

编辑＃1：所以我理解*.traineddata可以与Tesseract一起用作命令行程序。所以我的问题仍然相同，我如何使用python训练的数据？

编辑＃2：我的问题的答案在How to access the command line for Tesseract from Python?

Answer 1

以下是带有选项的pytesseract.image_to_string()示例。

pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),
                                  lang="eng",boxes=False,
                                  config="--psm 4 --oem 3 
                                  -c tessedit_char_whitelist=-01234567890XYZ:"))

要使用您自己经过培训的语言数据，只需使用您的语言"eng"替换lang="eng"中的name(.traineddata)即可。

如何使用pytesseract训练的数据？

1 个答案: