Question

我正在做一个使用python解决验证码的项目。我正在使用pytesseract模块。这个脚本运行良好它也通过修改它创建新的图像文件，但总是在解释行文本时产生错误= pytesseract.image_to_string（ Image.open（filename））从新的临时创建的图像文件中提取文本。我使用以下脚本 temporary image created for extraction of text

# import the necessary packages
from PIL import Image
import pytesseract
import argparse
import cv2
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image to be OCR'd")
ap.add_argument("-p", "--preprocess", type=str, default="thresh",
help="type of preprocessing to be done")
args = vars(ap.parse_args())

# load the example image and convert it to grayscale
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# check to see if we should apply thresholding to preprocess the
# image
if args["preprocess"] == "thresh":
    gray = cv2.threshold(gray, 0, 255,
    cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

# make a check to see if median blurring should be done to remove
# noise
elif args["preprocess"] == "blur":
    gray = cv2.medianBlur(gray, 3)

# write the grayscale image to disk as a temporary file so we can
# apply OCR to it
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)

# load the image as a PIL/Pillow image, apply OCR, and then delete
# the temporary file
text = pytesseract.image_to_string(Image.open(filename))
os.remove(filename)
print(text)

# show the output images
cv2.imshow("Image", image)
cv2.imshow("Output", gray)
cv2.waitKey(0)

＆＃13;

C:\Users\LENOVO\Desktop\ocr>python test.py -i image.jpg
Traceback (most recent call last):
  File "test.py", line 44, in <module>
    text = pytesseract.image_to_string(Image.open(filename))
  File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 193, in image_to_string
    return run_and_get_output(image, 'txt', lang, config, nice)
  File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 140, in run_and_get_output
    run_tesseract(**kwargs)
  File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 111, in run_tesseract
    proc = subprocess.Popen(command, stderr=subprocess.PIPE)
  File "C:\Python27\lib\subprocess.py", line 390, in __init__
    errread, errwrite)
  File "C:\Python27\lib\subprocess.py", line 640, in _execute_child
    startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

＆＃13;

＆＃13; 这是我的问题，并在谷歌搜索了很多，但我找不到合适的解决方案。谢谢

Answer 1

您确定安装了Tesseract软件吗？我得到了完全相同的错误，但是一旦我从this link安装了Google Tesseract OCR，你的确切脚本对我来说就很好并且产生了输出。我尝试了一段时间来解决Python中的答案，但我没有意识到这个Python库实际上只是一个包装器。

您可以read the documentation获取Python库，或者转到tesseract GitHub page获取更多信息。

先决条件：


Python-tesseract需要python 2.5+或python 3.x

您将需要Python Imaging Library（PIL）（或Pillow fork）。在Debian / Ubuntu下，这是包python-imaging或   python3成像。

安装Google Tesseract OCR https://github.com/tesseract-ocr/tesseract

pytesseract.image_to_string（）始终创建错误

1 个答案: