无法获取HTTPS URL(请求包)

时间:2018-01-03 16:27:27

标签: python python-3.x https web-scraping python-requests

当我尝试按照此处的指南操作时:https://automatetheboringstuff.com/chapter11/我的脚本失败了:

import requests

res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
type(res)
res.raise_for_status()

请求已安装。

经过漫长的等待后,我收到以下错误消息,仅在使用HTTPS URL时出现;在使用Python 3.6.3 64位和Python 3.6.4 64位的两台Windows 10 64位计算机上也会出现同样的情况:

"C:\Program Files\Python36\python.exe" "C:/Users/user.name/Google Drive/Automation/RoHSWebScraper/main.py"
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\urllib3\contrib\pyopenssl.py", line 441, in wrap_socket
    cnx.do_handshake()
  File "C:\Program Files\Python36\lib\site-packages\OpenSSL\SSL.py", line 1716, in do_handshake
    self._raise_ssl_error(self._ssl, result)
  File "C:\Program Files\Python36\lib\site-packages\OpenSSL\SSL.py", line 1449, in _raise_ssl_error
    raise SysCallError(-1, "Unexpected EOF")
OpenSSL.SSL.SysCallError: (-1, 'Unexpected EOF')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "C:\Program Files\Python36\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "C:\Program Files\Python36\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "C:\Program Files\Python36\lib\site-packages\urllib3\connection.py", line 326, in connect
    ssl_context=context)
  File "C:\Program Files\Python36\lib\site-packages\urllib3\util\ssl_.py", line 329, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "C:\Program Files\Python36\lib\site-packages\urllib3\contrib\pyopenssl.py", line 448, in wrap_socket
    raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\requests\adapters.py", line 440, in send
    timeout=timeout
  File "C:\Program Files\Python36\lib\site-packages\urllib3\connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Program Files\Python36\lib\site-packages\urllib3\util\retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='automatetheboringstuff.com', port=443): Max retries exceeded with url: /files/rj.txt (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/user.name/Google Drive/Automation/RoHSWebScraper/main.py", line 3, in <module>
    res = requests.get('https://automatetheboringstuff.com/files/rj.txt', verify=False)
  File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "C:\Program Files\Python36\lib\site-packages\requests\adapters.py", line 506, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='automatetheboringstuff.com', port=443): Max retries exceeded with url: /files/rj.txt (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))

Process finished with exit code 1

任何人都可以帮我解决这个令人愤怒的问题!!

2 个答案:

答案 0 :(得分:1)

您可以尝试urllib

Python2:

import urllib
data = urllib.urlopen('https://automatetheboringstuff.com/files/rj.txt').read()

Python3:

import urllib.requests
data = urllib.requests.urlopen('https://automatetheboringstuff.com/files/rj.txt').read()

答案 1 :(得分:0)

事实证明,我公司网络上的计算机正在使用代理服务器,这阻止了我的HTTP和HTTPS请求正确连接。

我按照Lelouchzqy here的回答来确定我的HTTP和HTTPS代理服务器是什么。

然后我按照罗兰史密斯here的回答告诉requests使用哪些代理。

希望如果他们遇到同样的问题,这将有助于未来的人!