如何使用render.html和SplashFormRequest.from_response处理cookie?

时间:2019-05-09 12:10:34

标签: python scrapy splash scrapy-splash

我正在将端点render.html与SplashFormRequest.from_response一起用于抓取基于asp.net的网站,但我不能连续进行SplashFormRequest.from_response而不丢失会话。

我尝试在args,meta或cookiejar中设置Cookie失败,这是我的代码的一部分:

def start_requests(self):
script = """
        function main(splash, args)
        splash:init_cookies(splash.args.cookies)
        splash.images_enabled = false
        splash:go(args.url)
        splash:wait(3)

        return {
            html = splash:html(),
            cookies = splash:get_cookies(),
            }
        end"""
request = SplashRequest(url=url, callback=self.parse, endpoint='execute',
                                    args={'lua_source': script,
                                          'url': url})
request.meta['splash']['session_id'] = self.session
yield request

def parse(self, response):
request = SplashFormRequest.from_response(response, url=url, formdata=data, callback=self.parse2, endpoint='render.html', args={'images': 0})
request.cookies = response.data['cookies']
request.meta['splash']['session_id'] = self.session
yield request

有一种方法可以使SplashFormRequest.from_response手动设置Cookie?像SplashFormRequest.from_response> SplashFormRequest.from_response一样?

0 个答案:

没有答案