Question

浏览后，似乎如果您通过Scrapy登录网站，如果您尝试在蜘蛛中使用Selenium，则经过身份验证的登录会话不会转移。有没有办法将该会话转移到Selenium？或者我是否必须使用Selenium再次登录该网站？

谢谢！

Answer 1

会话很可能只是你的cookie。因此，要转换为将会话转移到Selenium webdriver，您需要将scrapy请求的cookie设置为selenium。

Scrapy非常聪明，可以自行跟踪Cookie，您可以在response.headers找到当前请求的Cookie。
然后，您可以为您的webdriver设置这些cookie：

driver.add_cookie({'name': 'foo', 'domain': 'bar'})

您可以使用dict理解将response.headers['Set-Cookie']转换为字典，如：

import re
foo = response.headers['Set-Cookie']
values = {k.strip():v for k,v in re.findall(r'(.*?)=(.*?);', foo)}
driver.add_cookie(values)

注意：某些网站可以使用更复杂的会话，这些会话也需要其他标头匹配，但您也可以通过将scrapy响应标头复制到您的selenium webdriver来复制它。

Answer 2

在这里查看类似的问题scrapy selenium authentication

使用scrapy api登录

# call scrapy post request with after_login as callback
    return FormRequest.from_response(
        response,
        # formxpath=formxpath,
        formdata=formdata,
        callback=self.browse_files
    )

将会话传递给selenium驱动程序

# logged in previously with scrapy api   
# partial solution
     cookies = map(lambda e: e.strip(), cookie2.split(";"))

     for cookie in cookies:                
            cookie_map = {"name": name, "value": value}                  
            print "adding cookie"
            print cookie_map
            self.driver.add_cookie(cookie_map)

    self.driver.get(response.url)

    files = self.wait_for_elements_to_be_present(By.XPATH, "//*[@id='files']", response)
    print files

在使用Scrapy进行身份验证登录会话后使用Selenium

2 个答案: