使用请求模块引发ProxyError

时间:2014-02-11 10:49:56

标签: python proxy python-requests

我仍然是python的新手,并且无法弄清楚如何处理这个错误以及如何处理它以避免它甚至在尝试理解Requests模块的不同方法并在那里读出之后。

这是我使用的简单请求,其中 line 循环使用我正在尝试访问的不同URL的文本文件, d 包含许多URL的字典列表我用作代理。

import requests
import collections

# [...]
d = collections.deque(proxies)

with requests.session() as r:
    d.rotate(-1)
    page = r.get(line.rstrip(), proxies=d[0])

它完美地工作,直到列表中的某个代理由于某种原因超时并强制脚本引发此错误:

ProxyError                                Traceback (most recent call last)
C:\Python27\lib\site-packages\IPython\utils\py3compat.pyc in execfile(fname, glob, loc)
    195             else:
    196                 filename = fname
--> 197             exec compile(scripttext, filename, 'exec') in glob, loc
    198     else:
    199         def execfile(fname, *where):

C:\Users\Christopher Fargere\desktop\python\quick_scraper.py in <module>()
     72         with requests.session() as r:
     73                 d.rotate(-1)
---> 74                 page = r.get(line.rstrip(), proxies=d[0])
     75                 print d[0]
     76                 print page.status_code
 C:\Python27\lib\site-packages\requests\sessions.pyc in get(self, url, **kwargs)
    393
    394         kwargs.setdefault('allow_redirects', True)
--> 395         return self.request('GET', url, **kwargs)
    396
    397     def options(self, url, **kwargs):

C:\Python27\lib\site-packages\requests\sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert)
    381             'allow_redirects': allow_redirects,
    382         }
--> 383         resp = self.send(prep, **send_kwargs)
    384
    385         return resp

C:\Python27\lib\site-packages\requests\sessions.pyc in send(self, request, **kwargs)
    484         start = datetime.utcnow()
    485         # Send the request
--> 486         r = adapter.send(request, **kwargs)
    487         # Total elapsed time of the request (approximately)
    488         r.elapsed = datetime.utcnow() - start

C:\Python27\lib\site-packages\requests\adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
    379
    380         except _ProxyError as e:
--> 381             raise ProxyError(e)
    382
    383         except (_SSLError, _HTTPError) as e:

ProxyError: Cannot connect to proxy. Socket error: [Errno 11001] getaddrinfo failed.

我希望在出现错误时实现IF条件,该代码会从 d 列表中弹出代理并重试相同的URL。我确信它非常简单但无法理解Python中如何引发错误。 :(

1 个答案:

答案 0 :(得分:4)

要捕获异常,请使用exception handling;抓住引发的ProxyError

from requests.exceptions import ProxyError

with requests.session() as r:
    page = None

    for _ in range(len(d)):
        d.rotate(-1)
        try:
            page = r.get(line.rstrip(), proxies=d[0])
        except ProxyError:
            # ignore proxy exception, move to next proxy
            pass
        else:
            # success, break loop
            break

    if page is None:
        # none of the proxies worked
        raise ProxyError

这最多会尝试d中的所有代理,一个接一个。如果它们都不起作用,我们再次提升ProxyError,因为您可能想知道当时所有代理都失败了。