Python的requests.session cookie检索问题

时间:2018-08-04 20:35:51

标签: python-3.x python-requests session-cookies

我正在尝试从网站上抓取数据,当我使用request.session时,从网站上获取Cookie似乎有问题。下面的代码可以更好地解释

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36'}
url = "https://www.nseindia.com"
r_without_headers = requests.get(url)
print("response code",r_without_headers.status_code)
print("Resp no header cookies ",r_without_headers.cookies.get_dict())
r_with_headers = requests.get(url,headers = headers)
print("response code",r_with_headers.status_code)
print("Resp with header cookies ",r_with_headers.cookies.get_dict())

s1 = requests.session()
s1_req = s1.get(url)
print("response code",s1_req.status_code)
print("Session no header Cookies ", s1.cookies.get_dict())
print("Session no header Response Cookies", s1_req.cookies.get_dict())

s2 = requests.session()
s2.headers = headers
s2_req = s2.get(url)
print("response code",s2_req.status_code)
print("Session with header Cookies ", s2.cookies.get_dict())
print("Session with header Response Cookies", s2_req.cookies.get_dict())

输出

response code 200
Req no header cookies  {}
response code 200
Req with header cookies  {'ak_bmsc': 'F4040D045001A7CD57BBC58C09C9117F174C9D8E21750000240B665BDDE23467~plhUo272BWU9CTPiQAEJgiZ07qX/BOE0n6iOU8y9pewbmXipo8de1YROpMw6AEtjQDgdt3x+M/2QDATjSAtaRiDVlsDGZfohfsymElg0Xpq0Uta3OYSOSe2B48eg2lJD0CMios+0eqatEro6XvEkYAy+4D14EUHAE/eRp5oVUOpVL6JR8WMNNFoE6Xo7xYQtfLFu8hS1sUNABrYkr6XNFGY3YnkZmawa7imZswMI4tICc='}
response code 200
Session no header Cookies  {}
Session no header Request Cookies {}
response code 200
Session with header Cookies  {}
Session with header Request Cookies {}

问题

该网站显然需要User-Agent集来提供cookie,因此当我使用用户代理集进行获取请求时,我会得到期望的cookie,而没有用户代理集的我就不需要了。

当我对requests.session尝试相同时,无论是否包含标题,我都没有得到响应cookie?

问题

为什么会这样?我使用的会话不正确还是网站损坏了? (如果是这种情况,我不会感到惊讶)

如何使用会话获取cookie?

我目前的解决方法是,我发送一个简单的get请求来检索cookie,并手动在会话中设置cookie。但这似乎不正确,并且如果在后续请求中修改了Cookie,则不能保证会话将更新cookie,因为原始会话首先无法检索到cookie。我宁愿不使用裸露的请求来编写我的整个代码,也不用手动将cookie转移到后续请求。

1 个答案:

答案 0 :(得分:1)

问题是您用自己的命令覆盖了会话的headers对象(实际上不是实际的命令)。

相反,只需对其进行更新:

s2.headers.update(headers)

例如

import requests
url = "https://www.nseindia.com"
s2 = requests.session()
s2.headers.update({'User-Agent': 'Mozilla/5.0'})
s2_req = s2.get(url)
print("Session with header Cookies ", s2.cookies.keys())

愉快地输出

Session with header Cookies  ['ak_bmsc']