Question

我正在使用Scrapy蜘蛛，它在启动时使用登录表单进行身份验证。然后使用此经过身份验证的会话进行擦除。

在开发过程中，我经常多次运行蜘蛛来测试它。在每次运行开始时进行身份验证会阻止网站的登录表单。该网站通常会强制重置密码，我怀疑如果这种情况继续，它将禁止该帐户。

由于Cookie持续数小时，因此没有充分的理由在开发过程中经常登录。为了解决密码重置问题，在开发过程中在运行之间重用经过身份验证的会话/ cookie的最佳方法是什么？理想情况下，只有持久化会话已过期，蜘蛛才会尝试进行身份验证。

编辑：

我的结构如下：

def start_requests(self):
        yield scrapy.Request(self.base, callback=self.log_in)

def log_in(self, response):
        #response.headers includes 'Set-Cookie': 'JSESSIONID=xx'; Path=/cas/; Secure; HttpOnly'
        yield scrapy.FormRequest.from_response(response,
                                        formdata={'username': 'xxx',
                                                     'password':''},
                                          callback=self.logged_in)
def logged_in(self, response):
        #request.headers and subsequent requests all have headers fields 'Cookie': 'JSESSIONID=xxx';
        #response.headers has no mention of cookies
        #request.cookies is empty

当我在Chrome中运行相同的网页请求时，在“Cookie”标签下会列出约20个字段。

documentation似乎很薄。我已经尝试根据成功登录返回的值在所有传出请求的标头dict上设置字段'Cookie': 'JSESSIONID=xxx'，但这会反弹回登录屏幕

Answer 1

事实证明，对于一个特殊的开发解决方案，这比我想象的要容易。获取带有cookieString = request.headers['Cookie']的cookie字符串，保存，然后在后续传出请求中加载并执行：

request.headers.appendlist('Cookie', cookieString)

在Scrapy中保持爬行之间的认证会话以进行开发

1 个答案: