我不明白如何使用urllib3
或requests
连接到https网站。这让我发疯。我已经安装了certifi
,并且看到了它提供的默认.pem
文件。我尝试将脚本运行在[我不是该设备的管理员]的计算机上的每个requests.verify
和requests
的{{1}}选项设置为.pem
和.crt
。除了错误我什么都没有。
我改用urllib3
,现在得到:
H:\Projects\MyScraper\venv\Scripts\python.exe H:/Projects/MyScraper/MyScraper.py
Traceback (most recent call last):
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 839, in _validate_conn
conn.connect()
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connection.py", line 344, in connect
ssl_context=context)
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\util\ssl_.py", line 342, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 814, in __init__
self.do_handshake()
File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 1068, in do_handshake
self._sslobj.do_handshake()
File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "H:/Projects/MyScraper/MyScraper.py", line 15, in <module>
raw_html = HTTP.request('GET', 'https://portal.xsede.org/course-calendar/')
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\request.py", line 68, in request
**urlopen_kw)
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\request.py", line 89, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\poolmanager.py", line 323, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
**response_kw)
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
**response_kw)
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
**response_kw)
[Previous line repeated 6 more times]
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='portal.xsede.org', port=443): Max retries exceeded with url: /course-calendar/ (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),))
Process finished with exit code 1
我的代码如下:
#!/home/me/virtualenv/python3.6/3.6/bin/python
import certifi
import urllib3
from bs4 import BeautifulSoup
HTTP = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where(),
retries=10
)
raw_html = HTTP.request('GET', 'https://portal.xsede.org/course-calendar/')
html = BeautifulSoup(raw_html, 'html.parser')
它在raw_html = HTTP.request(...
行上爆炸。想法?
修改
嗯,这与我的目标主机有关。如果我转到google.com
,那么我的几个pem / crt文件都可以工作。
答案 0 :(得分:0)
问题是,您使用错误的证书提出请求。
您可以运行此命令来验证发出任何请求时使用的证书,然后在您的请求中使用该证书,
openssl s_client -showcerts -connect google.com:443
还请确保您使用受信任的CA证书将verify
的路径传递到CA_BUNDLE
文件或目录。
此受信任的CA列表也可以通过REQUESTS_CA_BUNDLE
环境变量来指定。
如果无法解决此问题,则可以明确合并environment settings into your session,
在使用准备好的请求流时,请记住它 不考虑环境。如果这会导致问题 您正在使用环境变量来更改行为 要求。例如:在中指定的自签名SSL证书 REQUESTS_CA_BUNDLE将不被考虑。结果,抛出了
SSL: CERTIFICATE_VERIFY_FAILED
。您可以解决此问题 通过将环境设置明确地合并到您的会话中:
从请求导入请求,会话
s = Session()
req = Request('GET', url)
prepped = s.prepare_request(req)
# Merge environment settings into session
settings = s.merge_environment_settings(prepped.url, None, None, None, None)
resp = s.send(prepped, **settings)
print(resp.status_code)