Question

我使用请求来抓取网页的某些内容当我使用

import requests  
requests.get('example.org')

我得到的页面与我使用broswer或使用

时得到的页面不同

import urllib.request
urllib.request.urlopen('example.org')

我尝试使用urllib，但实在太慢了在比较测试中，我做到了比requests快了50％!!

你如何解决这个问题？

Answer 1

经过大量调查后，我发现该网站只在第一个访问该网站的访问者的标题中传递了一个Cookie。

因此解决方案是使用head请求获取Cookie，然后使用您的get请求重新发送

import requests  
# get the cookies with head(), this doesn't get the body so it's FAST
cookies = requests.head('example.com')
# send get request with the cookies
result = requests.get('example.com', cookies=cookies)

现在它比urllib更快+相同的结果：）

python请求从浏览器或urllib返回不同的网页

1 个答案: