Question

我需要解释使用proxyDict的请求，特别是以下内容：

1。它是否均匀地循环遍历字典中的所有代理？

2。如果其中一个发生故障，请求能够处理它，或者我必须这样做会发生什么？

第3。如果一个人被“禁止”会发生什么，它会处理吗？

4。如果我在一个函数中进行调用，它是否仍会均匀地遍历代理？

所以，如果我有一个像这样的代理词典：

proxyDict = { 
    'https' : 'https://IP1:PORT', 
    'https' : 'https://IP2:PORT', 
    'https' : 'https://IP3:PORT',
    'https' : 'https://IP4:PORT'
}

我有一个获取请求：

s = requests.Session()
data = {"Username":"user", "Password":"pass"}
s.get(download_url, proxies = proxyDict, verify=False)

哪个可能在一个函数中，类似于（我的问题＃4）：

 def foo(download_url, proxyDict, s):
    s.get(download_url, proxies = proxyDict, verify=False)

还有什么方法可以打印当前正在使用的代理吗？

Answer 1

我认为您会发现proxyDict中的密钥应该是协议（例如http或https），而requests只会忽略您的代理使用http1等密钥

如果启用DEBUG日志记录，则可以查看代理requests正在使用的内容。考虑这个没有代理的初始请求：

>>> import logging
>>> logging.basicConfig(level='DEBUG')
>>> requests.get('http://google.com')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): google.com
DEBUG:urllib3.connectionpool:http://google.com:80 "GET / HTTP/1.1" 301 219
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.google.com
DEBUG:urllib3.connectionpool:http://www.google.com:80 "GET / HTTP/1.1" 200 4796
<Response [200]>

现在，让我们设置一个代理词典：

>>> proxyDict={'http': 'http://squid.corp.example.com:3128'}

使用该词典重新发出请求：

>>> requests.get('http://google.com', proxies=proxyDict)
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): squid.corp.example.com
DEBUG:urllib3.connectionpool:http://squid.corp.example.com:3128 "GET http://google.com/ HTTP/1.1" 301 219
DEBUG:urllib3.connectionpool:http://squid.corp.example.com:3128 "GET http://www.google.com/ HTTP/1.1" 200 4768
<Response [200]>

您可以在DEBUG消息中看到它正在使用代理而不是直接连接。现在，如果我们使用您的代理词典并提出相同的请求......

>>> proxyDict = { 
...     'https1' : 'https://IP1:PORT', 
...     'https2' : 'https://IP2:PORT', 
...     'https3' : 'https://IP3:PORT',
...     'https4' : 'https://IP4:PORT'
... }
>>> requests.get('http://google.com', proxies=proxyDict)
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): google.com
DEBUG:urllib3.connectionpool:http://google.com:80 "GET / HTTP/1.1" 301 219
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.google.com
DEBUG:urllib3.connectionpool:http://www.google.com:80 "GET / HTTP/1.1" 200 4790
<Response [200]>

...你可以看到它没有使用任何代理。

Answer 2

1。它是否均匀地遍历字典中的所有代理？
不，不是的。 proxies是一个包含协议和代理的字典，requests使用匹配请求协议的代理（如果有的话）。

2。如果其中一个发生故障，请求能够处理它，或者我必须这样做会发生什么？
如果代理由于某种原因而无法使用requests会引发异常，那么您可以抓住它。

3。如果一个人被“禁止”会发生什么，它会处理吗？
不，但如果您检查状态代码和响应正文，则可以检测到IP禁令。

4。如果我在一个函数中进行调用，它是否仍会均匀地遍历代理？
不，它不会，请参阅1.但是，您可以创建一个代理列表并循环它。

一个例子：

def next_proxy(current):
    '''Returns the next item in proxies.'''
    if not proxies:
        return None
    if current not in proxies or current == proxies[-1]:
        return proxies[0]
    return proxies[proxies.index(current)+1]

def bad_response(response, error_message='some message'):
    '''Detects ip ban and other bad responses.'''
    return response.status_code == 403 or error_message in response.text

proxies = [
    {'https':'https://177.131.51.155:53281', 'http':'http://177.131.51.155:53281'}, 
    {'https':'https://138.197.45.196:8118', 'http':'http://138.197.45.196:8118'}, 
    {'https':'https://153.146.159.139:8080', 'http':'http://153.146.159.139:8080'}, 
]

s = requests.Session()
proxy = None
for _ in range(10):
    print("Using proxy:", proxy)
    try: 
        r = s.get("http://jsonip.com/", proxies=proxy)
        print(r.text)
        if bad_response(r):
            print("bad response")
            #proxies.remove(proxy)
    except (requests.exceptions.ProxyError, requests.exceptions.ConnectionError):
        print("proxy error")
        proxies.remove(proxy)
    proxy = next_proxy(proxy)

请求对proxyDict做了什么？

2 个答案: