Scrapy:如何从启动中获取cookie

时间:2018-08-01 13:31:50

标签: python scrapy splash-screen

我正在尝试从启动请求中获取cookie,但是我一直遇到错误。

这是我正在使用的代码:

tix

这是我的class P2PEye(scrapy.Spider): name = 'p2peyeSpider' allowed_domains = ['p2peye.com'] start_urls = ['https://www.p2peye.com/platform/h9/'] def start_requests(self): script = ''' function main(splash) local url = splash.args.url assert(splash:go(url)) assert(splash:wait(0.5)) return { cookies = splash:get_cookies(), } end ''' for url in self.start_urls: yield SplashRequest(url, callback=self.parse, endpoint='render.html',args={'wait': 1, 'lua_source': script}) def parse(self, response): print(response.request.headers.getlist('Set-Cookie')) print(response.cookiejar)

settings.py

SPLASH_URL = 'http://127.0.0.1:8050' CRAWLERA_ENABLED= False DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, } SPIDER_MIDDLEWARES = {'scrapy_splash.SplashDeduplicateArgsMiddleware': 100 } DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter' HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage' COOKIES_ENABLED = True COOKIES_DEBUG = True SPLASH_COOKIES_DEBUG = True 的结果为response.request.headers.getlist('Set-Cookie'), 并且[]出现错误:response.cookiejar。 那么,如何在不导致错误的情况下获取Cookie?

1 个答案:

答案 0 :(得分:1)

要访问response.cookiejar,您需要返回SplashJsonResponse

尝试在Lua脚本上返回额外的字段:

script = '''
        function main(splash)
          local url = splash.args.url
          assert(splash:go(url))
          assert(splash:wait(0.5))
          local entries = splash:history()
          local last_response = entries[#entries].response
          return {
            url = splash:url(),
            headers = last_response.headers,
            http_status = last_response.status,
            cookies = splash:get_cookies(),
            html = splash:html(),
          }
        end
        '''