用Python绕过验证码

时间:2018-03-07 20:17:00

标签: python python-imaging-library tesseract captcha

我正在使用Tesseract,PIL和ImageMagick工具来绕过验证码。

以下是代码:

import pytesseract
import sys
import argparse
try:
    import Image
except ImportError:
    from PIL import Image
from subprocess import check_output


def resolve(path):
    print("Resampling the Image")
    check_output(['convert', path, '-resample', '600', path])
    return pytesseract.image_to_string(Image.open(path))

argparser = argparse.ArgumentParser()
argparser.add_argument('/Users/rodrigopeniche/Documents/workspace/WebScraping/captcha.png',help = 'Captcha file path')
args = argparser.parse_args()
path = args.path
print('Resolving Captcha')
captcha_text = resolve(path)
print('Extracted Text', captcha_text)

首先,无论如何执行此代码而不必在命令行中执行时传递文件位置?

然后我在执行时遇到此错误:

Traceback (most recent call last):
  File "/Users/rodrigopeniche/Documents/workspace/WebScraping/captchabypasser.py", line 20, in <module>
path = args.path
AttributeError: 'Namespace' object has no attribute 'path'

1 个答案:

答案 0 :(得分:0)

问题是你永远不会定义一个名为argparser.add_argument('path', help='Captcha file path') #Or, if you want to make the command line parameter optional, give it a default argparser.add_argument('path', default='/Users/rodrigopeniche/Documents/workspace/WebScraping/captcha.png', help='Captcha file path') 的参数。将add_argument行更改为:

{{1}}

当然,如果您对解析命令行参数没兴趣,可以手动设置路径。