我正在将端点render.html与SplashFormRequest.from_response一起用于抓取基于asp.net的网站,但我不能连续进行SplashFormRequest.from_response而不丢失会话。
我尝试在args,meta或cookiejar中设置Cookie失败,这是我的代码的一部分:
def start_requests(self):
script = """
function main(splash, args)
splash:init_cookies(splash.args.cookies)
splash.images_enabled = false
splash:go(args.url)
splash:wait(3)
return {
html = splash:html(),
cookies = splash:get_cookies(),
}
end"""
request = SplashRequest(url=url, callback=self.parse, endpoint='execute',
args={'lua_source': script,
'url': url})
request.meta['splash']['session_id'] = self.session
yield request
def parse(self, response):
request = SplashFormRequest.from_response(response, url=url, formdata=data, callback=self.parse2, endpoint='render.html', args={'images': 0})
request.cookies = response.data['cookies']
request.meta['splash']['session_id'] = self.session
yield request
有一种方法可以使SplashFormRequest.from_response
手动设置Cookie?像SplashFormRequest.from_response
> SplashFormRequest.from_response
一样?