我正在尝试为webscrapping项目制作一个验证码阅读器。
基于此链接中提到的步骤 https://www.scrapehero.com/how-to-solve-simple-captchas-using-python-tesseract/ ,我已经尝试了该过程,但是遇到以下错误;有人可以告诉我这是什么问题吗?
import pytesseract
import sys
import argparse
try:
import Image
except ImportError:
from PIL import Image
from subprocess import check_output
def resolve(path):
print("Resampling the Image")
check_output(['convert', path, '-resample', '600', path])
return pytesseract.image_to_string(Image.open(path))
if __name__ == "__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument('path', help='Captcha file path')
args = argparser.parse_args()
path = args.path
print('Resolving Captcha')
captcha_text = resolve(path)
print('Extracted Text', captcha_text)
预期的结果是将6个字母的验证码显示为文本。但我收到此错误:
Resolving Captcha
Resampling the Image
Invalid Parameter - -resample
Traceback (most recent call last):
File "captcha_resolver.py", line 25, in <module>
captcha_text = resolve(path)
File "captcha_resolver.py", line 14, in resolve
check_output(['convert', path, '-resample', '600', path])
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 395, in check_output
**kwargs).stdout
其他尝试此操作的人也面临着类似的问题,如下所示: https://gist.github.com/scrapehero/b85a280dc0d993f665c91e0332cf618f