在我的蜘蛛收到回复后,我想下载并显示验证码图像,然后继续抓取:
def get_captcha(self, response):
print '\nLoading captcha...\n'
item = CaptchaItem()
hxs = HtmlXPathSelector(response)
captcha_img_src = hxs.select('//*[@id="captcha-image"]/@src').extract()[0]
item['image_urls'] = [captcha_img_src]
return item
但我不知道何时加载图像以及之后如何继续爬行。
仅供参考:没有cookie,无法下载验证码。
提前致谢!
答案 0 :(得分:0)
使用yield而不是return:
def get_captcha(self, response):
print '\nLoading captcha...\n'
item = CaptchaItem()
hxs = HtmlXPathSelector(response)
captcha_img_src = hxs.select('//*[@id="captcha-image"]/@src').extract()[0]
item['image_urls'] = [captcha_img_src]
yield item
#you may display here your scraped item and after that
#your further post request goes here...
yield your_request