我正在使用Tesseract,PIL和ImageMagick工具来绕过验证码。
以下是代码:
import pytesseract
import sys
import argparse
try:
import Image
except ImportError:
from PIL import Image
from subprocess import check_output
def resolve(path):
print("Resampling the Image")
check_output(['convert', path, '-resample', '600', path])
return pytesseract.image_to_string(Image.open(path))
argparser = argparse.ArgumentParser()
argparser.add_argument('/Users/rodrigopeniche/Documents/workspace/WebScraping/captcha.png',help = 'Captcha file path')
args = argparser.parse_args()
path = args.path
print('Resolving Captcha')
captcha_text = resolve(path)
print('Extracted Text', captcha_text)
首先,无论如何执行此代码而不必在命令行中执行时传递文件位置?
然后我在执行时遇到此错误:
Traceback (most recent call last):
File "/Users/rodrigopeniche/Documents/workspace/WebScraping/captchabypasser.py", line 20, in <module>
path = args.path
AttributeError: 'Namespace' object has no attribute 'path'
答案 0 :(得分:0)
问题是你永远不会定义一个名为argparser.add_argument('path', help='Captcha file path')
#Or, if you want to make the command line parameter optional, give it a default
argparser.add_argument('path', default='/Users/rodrigopeniche/Documents/workspace/WebScraping/captcha.png', help='Captcha file path')
的参数。将add_argument行更改为:
{{1}}
当然,如果您对解析命令行参数没兴趣,可以手动设置路径。