我仍然是python的新手,并且无法弄清楚如何处理这个错误以及如何处理它以避免它甚至在尝试理解Requests模块的不同方法并在那里读出之后。
这是我使用的简单请求,其中 line 循环使用我正在尝试访问的不同URL的文本文件, d 包含许多URL的字典列表我用作代理。
import requests
import collections
# [...]
d = collections.deque(proxies)
with requests.session() as r:
d.rotate(-1)
page = r.get(line.rstrip(), proxies=d[0])
它完美地工作,直到列表中的某个代理由于某种原因超时并强制脚本引发此错误:
ProxyError Traceback (most recent call last)
C:\Python27\lib\site-packages\IPython\utils\py3compat.pyc in execfile(fname, glob, loc)
195 else:
196 filename = fname
--> 197 exec compile(scripttext, filename, 'exec') in glob, loc
198 else:
199 def execfile(fname, *where):
C:\Users\Christopher Fargere\desktop\python\quick_scraper.py in <module>()
72 with requests.session() as r:
73 d.rotate(-1)
---> 74 page = r.get(line.rstrip(), proxies=d[0])
75 print d[0]
76 print page.status_code
C:\Python27\lib\site-packages\requests\sessions.pyc in get(self, url, **kwargs)
393
394 kwargs.setdefault('allow_redirects', True)
--> 395 return self.request('GET', url, **kwargs)
396
397 def options(self, url, **kwargs):
C:\Python27\lib\site-packages\requests\sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert)
381 'allow_redirects': allow_redirects,
382 }
--> 383 resp = self.send(prep, **send_kwargs)
384
385 return resp
C:\Python27\lib\site-packages\requests\sessions.pyc in send(self, request, **kwargs)
484 start = datetime.utcnow()
485 # Send the request
--> 486 r = adapter.send(request, **kwargs)
487 # Total elapsed time of the request (approximately)
488 r.elapsed = datetime.utcnow() - start
C:\Python27\lib\site-packages\requests\adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
379
380 except _ProxyError as e:
--> 381 raise ProxyError(e)
382
383 except (_SSLError, _HTTPError) as e:
ProxyError: Cannot connect to proxy. Socket error: [Errno 11001] getaddrinfo failed.
我希望在出现错误时实现IF条件,该代码会从 d 列表中弹出代理并重试相同的URL。我确信它非常简单但无法理解Python中如何引发错误。 :(
答案 0 :(得分:4)
要捕获异常,请使用exception handling;抓住引发的ProxyError
:
from requests.exceptions import ProxyError
with requests.session() as r:
page = None
for _ in range(len(d)):
d.rotate(-1)
try:
page = r.get(line.rstrip(), proxies=d[0])
except ProxyError:
# ignore proxy exception, move to next proxy
pass
else:
# success, break loop
break
if page is None:
# none of the proxies worked
raise ProxyError
这最多会尝试d
中的所有代理,一个接一个。如果它们都不起作用,我们再次提升ProxyError
,因为您可能想知道当时所有代理都失败了。