使用python3访问https页面

时间:2018-10-23 22:01:12

标签: python-3.x urllib3

我不明白如何使用urllib3requests连接到https网站。这让我发疯。我已经安装了certifi,并且看到了它提供的默认.pem文件。我尝试将脚本运行在[我不是该设备的管理员]的计算机上的每个requests.verifyrequests的{​​{1}}选项设置为.pem.crt。除了错误我什么都没有。

我改用urllib3,现在得到:

H:\Projects\MyScraper\venv\Scripts\python.exe H:/Projects/MyScraper/MyScraper.py
Traceback (most recent call last):
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 839, in _validate_conn
    conn.connect()
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connection.py", line 344, in connect
    ssl_context=context)
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\util\ssl_.py", line 342, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 814, in __init__
    self.do_handshake()
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 1068, in do_handshake
    self._sslobj.do_handshake()
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "H:/Projects/MyScraper/MyScraper.py", line 15, in <module>
    raw_html = HTTP.request('GET', 'https://portal.xsede.org/course-calendar/')
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\request.py", line 68, in request
    **urlopen_kw)
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\request.py", line 89, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\poolmanager.py", line 323, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
    **response_kw)
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
    **response_kw)
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
    **response_kw)
  [Previous line repeated 6 more times]
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "H:\Projects\MyScraper\venv\lib\site-packages\urllib3\util\retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='portal.xsede.org', port=443): Max retries exceeded with url: /course-calendar/ (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),))

Process finished with exit code 1

我的代码如下:

    #!/home/me/virtualenv/python3.6/3.6/bin/python

    import certifi
    import urllib3
    from bs4 import BeautifulSoup

    HTTP = urllib3.PoolManager(
        cert_reqs='CERT_REQUIRED',
        ca_certs=certifi.where(),
        retries=10
    )

    raw_html = HTTP.request('GET', 'https://portal.xsede.org/course-calendar/')

    html = BeautifulSoup(raw_html, 'html.parser')

它在raw_html = HTTP.request(...行上爆炸。想法?

修改

嗯,这与我的目标主机有关。如果我转到google.com,那么我的几个pem / crt文件都可以工作。

1 个答案:

答案 0 :(得分:0)

问题是,您使用错误的证书提出请求。

您可以运行此命令来验证发出任何请求时使用的证书,然后在您的请求中使用该证书,

openssl s_client -showcerts -connect google.com:443

还请确保您使用受信任的CA证书将verify的路径传递到CA_BUNDLE文件或目录。

此受信任的CA列表也可以通过REQUESTS_CA_BUNDLE环境变量来指定。

如果无法解决此问题,则可以明确合并environment settings into your session

  

在使用准备好的请求流时,请记住它   不考虑环境。如果这会导致问题   您正在使用环境变量来更改行为   要求。例如:在中指定的自签名SSL证书   REQUESTS_CA_BUNDLE将不被考虑。结果,抛出了SSL: CERTIFICATE_VERIFY_FAILED。您可以解决此问题   通过将环境设置明确地合并到您的会话中:

从请求导入请求,会话

s = Session()
req = Request('GET', url)

prepped = s.prepare_request(req)

# Merge environment settings into session
settings = s.merge_environment_settings(prepped.url, None, None, None, None)
resp = s.send(prepped, **settings)

print(resp.status_code)