我想使用pythons requests模块抓取Heritrix主页。当我尝试在chrome上打开此页面时,出现错误:
This server could not prove that it is 10.100.121.41; its security
certificate is not trusted by your computer's operating system. This
may be caused by a misconfiguration or an attacker intercepting your
connection.
但我可以继续阅读该页面。当我尝试使用requests抓取同一页面时,我收到了SSL错误,经过一番挖掘后,我使用了a SO question中的以下代码:r=requests.get(url,auth=(username, password),verify=False
。这给了我以下警告/usr/lib/python2.6/site-packages/requests/packages/urllib3/connectionpool.py:734: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
,并返回401的状态代码。如何解决这个问题?
答案 0 :(得分:1)
401表示您需要进行身份验证,但是您使用了错误的方法。请求内置的另一种非常常见的身份验证方法是摘要式身份验证。您可以通过查看以下内容来确定是否要使用摘要式身份验证:
r.headers.get('www-authenticate')
它应该有digest
。 (如果它没有,那么它不会期望摘要式身份验证。)您可以在请求中使用摘要式身份验证:
from requests import auth
r = requests.get(url, auth=auth.HTTPDigestAuth(username, password), verify=False)
您看到的警告与401无关,它只是警告您,您所做的请求是针对HTTPS网站的,并且您的连接可能是有效的中间人“d-dd; dd由攻击者如果你想沉默,你可以做以下事情:
from requests.packages import urllib3
urllib3.disable_warnings()