目标:我正在尝试从网站上抓取一些数据。
问题:我必须使用请求库,但这会给我带来错误。
import requests
url = "https://www.random.com"
requests.get(url)
尝试解决方案:我有另一个使用 urllib3 来获取网站数据的程序,它运行正常:
url = "https://www.random.com"
http = urllib3.PoolManager()
http.request("GET", url)
我发现 urllib3 和请求使用一个公共库发送请求,因此我想我可以看到两者之间的区别,并可能相应地进行更改。在阅读下面的堆栈跟踪信息时,我注意到由于某些原因请求无法读取响应(代码部分来自该库之一的connectionpool.py):
def begin(self):
if self.headers is not None:
# we've already started reading the response
return
# read until we get a non-100 response
while True:
version, status, reason = self._read_status()
上面代码中的最后一行是两者都不同的地方。 请求出现错误,但没有任何响应。 urllib3 得到回应并继续。我怀疑这与请求的安全协议有关,但是我迷路了,因为在发送响应之前设置了太多变量。
第一段代码的完整堆栈跟踪:
Traceback (most recent call last):
File "/path/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/path/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/path/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "/path/lib/python3.7/http/client.py", line 1321, in getresponse
response.begin()
File "/path/lib/python3.7/http/client.py", line 296, in begin
version, status, reason = self._read_status()
File "/path/lib/python3.7/http/client.py", line 257, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/path/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/path/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 290, in recv_into
raise SocketError(str(e))
OSError: (104, 'ECONNRESET')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/path/lib/python3.7/site-packages/requests/adapters.py", line 445, in send
timeout=timeout
File "/path/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/path/lib/python3.7/site-packages/urllib3/util/retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/path/lib/python3.7/site-packages/urllib3/packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/path/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/path/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/homes/ubalgans/miniconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "/path/lib/python3.7/http/client.py", line 1321, in getresponse
response.begin()
File "/path/lib/python3.7/http/client.py", line 296, in begin
version, status, reason = self._read_status()
File "/path/lib/python3.7/http/client.py", line 257, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/path/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/path/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 290, in recv_into
raise SocketError(str(e))
urllib3.exceptions.ProtocolError: ('Connection aborted.', OSError("(104, 'ECONNRESET')"))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/path/PycharmProjects/test/reqtest.py", line 8, in <module>
requests.get(url)
File "/path/lib/python3.7/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/path/lib/python3.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/path/lib/python3.7/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/path/lib/python3.7/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/path/lib/python3.7/site-packages/requests/adapters.py", line 495, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(104, 'ECONNRESET')"))
编辑:我启用了日志记录功能,以查看两者之间的区别。
请求:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): random.com:443
urllib3:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): random.com:443
/path/urllib3/connectionpool.py:857: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
DEBUG:urllib3.connectionpool:https://random.com:443 "GET /request HTTP/1.1" 200 15189