我已经构建了一个使用urllib3和python3下载网页的抓取工具脚本。我在Google Cloud Virtual Machine
实例中运行了爬虫,并尝试使用许多不同的URL。我找到了一个我无法在VM实例上下载但是在我的个人Linux上成功下载的URL。
我在计算机上尝试并运行的代码:
>>> import urllib3
>>> pool = urllib3.PoolManager()
>>> response = pool.request('GET', 'http://www.azlyrics.com/lyrics/drake/dowhatyoudo.html')
>>> response.status
200
我使用ssh在VM实例上尝试了完全相同的代码,并发生了以下异常:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 594, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 391, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 387, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
response.begin()
File "/usr/lib/python3.5/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.5/http/client.py", line 266, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/urllib3/request.py", line 66, in request
**urlopen_kw)
File "/usr/local/lib/python3.5/dist-packages/urllib3/request.py", line 87, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "/usr/local/lib/python3.5/dist-packages/urllib3/poolmanager.py", line 244, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 671, in urlopen
release_conn=release_conn, **response_kw)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 671, in urlopen
release_conn=release_conn, **response_kw)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 671, in urlopen
release_conn=release_conn, **response_kw)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 643, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.5/dist-packages/urllib3/util/retry.py", line 303, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='www.azlyrics.com', port=80): Max retries exceeded with url: /lyrics/drake/dowhatyoudo.html (Caused by ProtocolError('Connection aborted.', RemoteDisconnected('Remote
end closed connection without response',)))
我已检查过VM仪表板,但我没有找到任何我可能错误定义的防火墙或其他限制。
它可能是什么?