Question

我在Windows 10中使用Python 3.6，并且已经安装了Pytesseract，但是在code Tesserocr中发现了它，但我无法安装。有什么区别？

我有Visual Studio Community 2017和Anaconda。

错误如下：

load

Answer 1

pytesseract仅是tesseract-ocr的Python绑定。因此，如果要在Python代码中使用tesseract-ocr而不使用subprocess或os模块来运行命令行tesseract-ocr命令，则可以使用pytesseract。但是，要使用它，必须安装tesseract-ocr。

您可以这样想。您需要安装tesseract-ocr，因为它是实际运行并执行OCR的程序。但是，如果要从python代码作为函数运行它，请安装pytesseract软件包以使您能够执行此操作。因此，当您运行pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra')时，它将使用提供的参数调用tesseract-ocr。结果与运行tesseract test-european.jpg -l fra相同。因此，您可以从代码中调用它，但是最后，它仍然必须运行tesseract-ocr来执行实际的OCR。

Answer 2

Pytesseract是tesseract二进制文件的python“包装器”。它仅提供以下功能以及指定标志（man page）：

get_tesseract_version返回系统中安装的Tesseract版本。
image_to_string将在图像上运行的Tesseract OCR的结果返回为字符串
image_to_boxes返回包含可识别字符及其框边界的结果
image_to_data返回包含框边界，置信度和其他信息的结果。需要Tesseract 3.05+。有关更多信息，请查看Tesseract TSV文档
image_to_osd返回包含有关方向和脚本检测信息的结果。

有关更多信息，请参见project description。

另一方面，tesserocr直接与Tesseract的C ++ API（APIExample）接口，该接口更加灵活/复杂，并提供高级功能。

Answer 3

根据我的经验，Tesserocr比Pytesseract快得多。

Tesserocr是Tesseract C ++ API中的python包装器。 pytesseract是tesseract-ocr CLI的包装。

因此，使用Tesserocr可以在模型的开头或程序中加载模型，然后分别运行模型（例如，循环处理视频）。使用pytesseract，每次调用image_to_string函数时，它都会加载模型并处理图像，因此视频处理速度变慢。

要安装tesserocr，我只需在终端pip install tesserocr中输入。

要使用tesserocr

import tesserocr
from PIL import Image
api = tesserocr.PyTessBaseAPI()
pil_image = Image.open('sample.jpg')
api.SetImage(pil_image)
text = api.GetUTF8Text()

要安装pytesseract：pip install pytesseract。

要运行它：

import pytesseract
import cv2
image = cv2.imread('sample.jpg')
text = pytesseract.image_to_string(image)

Pytesseract和Tesserocr有什么区别？

3 个答案: