请求＆＃39; secret：pool_connections和pool_maxsize

Question

初始化请求“Session时，将创建两个HTTPAdapter并mount to http and https。

这是HTTPAdapter的定义方式：

class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10,
                                    max_retries=0, pool_block=False)

虽然我理解pool_maxsize的含义（这是池可以保存的会话数），但我不明白pool_connections的含义或含义。 Doc说：

Parameters: 
pool_connections – The number of urllib3 connection pools to cache.

但是“缓存”是什么意思？使用多个连接池有什么意义呢？

Answer 1

我写了article这个。粘贴在这里：

请求＆＃39; secret：pool_connections和pool_maxsize

Requests是Python程序员最熟悉的Python第三方库之一。凭借其简单的API和高性能，人们倾向于使用请求而不是标准库为HTTP请求提供的urllib2。但是，每天使用请求的人可能不知道内部情况，今天我想介绍其中两个：pool_connections和pool_maxsize。

让我们从Session开始：

import requests

s = requests.Session()
s.get('https://www.google.com')

非常简单。你可能知道请求＆＃39; Session可以保留Cookie。凉。但是，您知道Session有mount方法吗？

mount(prefix, adapter)
将连接适配器注册到前缀适配器按密钥长度按降序排序。

没有？嗯，事实上，当你initialize a Session object：

时，你已经使用过这种方法了

class Session(SessionRedirectMixin):

    def __init__(self):
        ...
        # Default connection adapters.
        self.adapters = OrderedDict()
        self.mount('https://', HTTPAdapter())
        self.mount('http://', HTTPAdapter())

现在是有趣的部分。如果您已阅读Ian Cordasco的文章Retries in Requests，您应该知道HTTPAdapter可用于提供重试功能。但究竟什么是HTTPAdapter？引自doc：

class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False)

内置的urllib3 HTTP适配器。

通过实现传输适配器接口，为请求会话提供一个通用案例接口，以便联系HTTP和HTTPS URL。这个类通常由Session类创建。

参数：
  * pool_connections - 要缓存的urllib3连接池的数量。   * pool_maxsize - 要在池中保存的最大连接数。   * max_retries(int) - 每个连接应尝试的最大重试次数。请注意，这仅适用于失败的DNS查找，套接字连接和连接超时，从不适用于数据已发送到服务器的请求。默认情况下，请求不会重试失败的连接。如果您需要对我们重试请求的条件进行精细控制，请导入urllib3的Retry类并将其传递给它。   * pool_block - 连接池是否应阻止连接。   用法：

>>> import requests
>>> s = requests.Session()
>>> a = requests.adapters.HTTPAdapter(max_retries=3)
>>> s.mount('http://', a)

如果上述文档让您感到困惑，请按照我的解释：HTTP Adapter所做的只是根据目标网址为不同的请求提供不同的配置。还记得上面的代码吗？

self.mount('https://', HTTPAdapter())
self.mount('http://', HTTPAdapter())

它使用默认参数HTTPAdapter创建两个pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False个对象，并分别装载到https://和http://，这意味着第一个HTTPAdapter()将配置如果您尝试向http://xxx发送请求，则会使用，第二个HTTPAdapter()将用于https://xxx的请求。在这种情况下，我们认为两个配置相同，对http和https的请求仍然是单独处理的。我们稍后会看到它意味着什么。

正如我所说，本文的主要目的是解释pool_connections和pool_maxsize。

首先让我们看pool_connections。昨天我在stackoverflow上提出question因为我不确定我的理解是否正确，答案消除了我的不确定性。众所周知，HTTP基于TCP协议。 HTTP连接也是TCP连接，由五个值的元组标识：

(<protocol>, <src addr>, <src port>, <dest addr>, <dest port>)

假设您已与www.example.com建立了HTTP / TCP连接，假设服务器支持Keep-Alive，则下次向www.example.com/a或www.example.com/b发送请求时，您可以使用相同的连接，因为五个值都不会改变。实际上，requests' Session automatically does this for you并且只要可以，就会重用连接。

问题是，是什么决定了你是否可以重用旧连接？是的，pool_connections！

pool_connections - 要缓存的urllib3连接池的数量。

我知道，我知道，我也不想带这么多术语，这是最后一个，我保证。为了便于理解，一个连接池对应一个主机，这是什么。

这是一个例子（忽略不相关的行）：

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')

"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2621
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
"""

HTTPAdapter(pool_connections=1)已挂载到https://，这意味着一次只能存在一个连接池。调用s.get('https://www.baidu.com')后，缓存的连接池为connectionpool('https://www.baidu.com')。现在s.get('https://www.zhihu.com')来了，会话发现它不能使用以前缓存的连接，因为它不是同一个主机（一个连接池对应一个主机，还记得吗？）。因此，如果您愿意，会话必须创建新的连接池或连接。从pool_connections=1开始，session不能同时保存两个连接池，因此它放弃了connectionpool('https://www.baidu.com')的旧连接池并保留了connectionpool('https://www.zhihu.com')的新连接池。下一个get是相同的。这就是我们在日志记录中看到三个Starting new HTTPS connection的原因。

如果我们将pool_connections设置为2：

，该怎么办？

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=2))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
"""

很好，现在我们只创建了两次连接并保存了一个建立时间的连接。

最后，pool_maxsize。

首先，只有在多线程环境中使用pool_maxsize时才应关注Session，例如使用从多个线程发出并发请求相同 Session。

实际上，pool_maxsize是初始化urllib3 HTTPConnectionPool的参数，这正是我们上面提到的连接池。 HTTPConnectionPool是与特定主机的连接集合的容器，pool_maxsize是可以重用的要保存的连接数。如果您在一个线程中运行代码，那么创建与同一主机的多个连接既不可能也不需要，因为请求库是阻塞的，因此HTTP请求总是一个接一个地发送。

如果有多个线程，情况会有所不同。

def thread_get(url):
    s.get(url)

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
"""

请参阅？它为同一主机www.zhihu.com建立了两个连接，就像我说的，这只能在多线程环境中发生。在这种情况下，我们使用pool_maxsize=2创建一个连接池，并且同时只有两个连接，所以它已经足够了。我们可以看到来自t3和t4的请求没有创建新的连接，它们重用旧的连接。

如果尺寸不够，该怎么办？

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start()
t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (3): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
"""

现在，pool_maxsize=1，警告按预期发布：

Connection pool is full, discarding connection: www.zhihu.com

我们还注意到，由于此池中只能保存一个连接，因此会再次为t3或t4创建新连接。显然这是非常低效的。这就是为什么在urllib3的文档中说：

如果您计划在多线程环境中使用此类池，则应将池的maxsize设置为更高的数字，例如线程数。

最后但并非最不重要的是，安装到不同前缀的HTTPAdapter个实例独立。

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
s.mount('https://baidu.com', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 =Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57669
"""

上面的代码很容易理解，所以我不解释。

我想这就是全部。希望本文能帮助您更好地理解请求。顺便说一下，我创建了一个gist here，其中包含了本文中使用的所有测试代码。只需下载并玩它：）

附录

对于https，请求使用urllib3＆＃39; HTTPSConnectionPool，但它与HTTPConnectionPool几乎相同，因此我不会在本文中对它们进行区分。

Session＆＃39; mount方法将确保首先匹配最长的前缀。它的实现非常有趣，所以我在这里发布了它。

def mount(self, prefix, adapter):
    """Registers a connection adapter to a prefix.
    Adapters are sorted in descending order by key length."""
    self.adapters[prefix] = adapter
    keys_to_move = [k for k in self.adapters if len(k) < len(prefix)]
    for key in keys_to_move:
        self.adapters[key] = self.adapters.pop(key)

请注意，self.adapters是OrderedDict。

Answer 2

请求使用urllib3来管理其连接和其他功能。

重用连接是保持重复执行HTTP请求的重要因素。 The urllib3 README explains：

为什么我要重用连接？

性能。当您通常进行urllib调用时，会为每个请求创建一个单独的套接字连接。通过重用现有的套接字（自HTTP 1.1起支持），请求将在服务器端占用更少的资源，并在客户端提供更快的响应时间。 [...]

要回答你的问题，＆＃34; pool_maxsize＆＃34;是每个主机要保持的连接数（这对多线程应用程序很有用），而＆＃34; pool_connections＆＃34;是要保留的主机池数量。例如，如果您要连接到100个不同的主机和pool_connections=10，那么只有最新的10个主机＆＃39;连接将被重复使用。

Answer 3

感谢@ laike9m提供现有的问答和文章，但是现有的答案未能提及pool_maxsize的微妙之处及其与多线程代码的关系。

摘要

pool_connections是在给定时间从一个（主机，端口，方案）端点在池中保持活动状态的连接数。如果要在一个池中最多保留n个打开的TCP连接以供Session重用，则需要pool_connections=n。
pool_maxsize对于requests的用户实际上是无关紧要的，因为pool_block（在requests.adapters.HTTPAdapter中）的默认值为False而不是{{1} }

详细信息

如此处正确指出的，True是给定适配器前缀的最大打开连接数。最好通过示例来说明：

pool_connections

以上，最大连接数为1；是>>> import requests >>> from requests.adapters import HTTPAdapter >>> >>> from urllib3 import add_stderr_logger >>> >>> add_stderr_logger() # Turn on requests.packages.urllib3 logging 2018-12-21 20:44:03,979 DEBUG Added a stderr logging handler to logger: urllib3 <StreamHandler <stderr> (NOTSET)> >>> >>> s = requests.Session() >>> s.mount('https://', HTTPAdapter(pool_connections=1)) >>> >>> # 4 consecutive requests to (github.com, 443, https) ... # A new HTTPS (TCP) connection will be established only on the first conn. ... s.get('https://github.com/requests/requests/blob/master/requests/adapters.py') 2018-12-21 20:44:03,982 DEBUG Starting new HTTPS connection (1): github.com:443 2018-12-21 20:44:04,381 DEBUG https://github.com:443 "GET /requests/requests/blob/master/requests/adapters.py HTTP/1.1" 200 None <Response [200]> >>> s.get('https://github.com/requests/requests/blob/master/requests/packages.py') 2018-12-21 20:44:04,548 DEBUG https://github.com:443 "GET /requests/requests/blob/master/requests/packages.py HTTP/1.1" 200 None <Response [200]> >>> s.get('https://github.com/urllib3/urllib3/blob/master/src/urllib3/__init__.py') 2018-12-21 20:44:04,881 DEBUG https://github.com:443 "GET /urllib3/urllib3/blob/master/src/urllib3/__init__.py HTTP/1.1" 200 None <Response [200]> >>> s.get('https://github.com/python/cpython/blob/master/Lib/logging/__init__.py') 2018-12-21 20:44:06,533 DEBUG https://github.com:443 "GET /python/cpython/blob/master/Lib/logging/__init__.py HTTP/1.1" 200 None <Response [200]>。如果要从新的（主机，端口，方案）三元组中请求资源，则(github.com, 443, https)内部将转储现有的连接，以便为新的连接腾出空间：

Session

您可以将数字增加到>>> s.get('https://www.rfc-editor.org/info/rfc4045') 2018-12-21 20:46:11,340 DEBUG Starting new HTTPS connection (1): www.rfc-editor.org:443 2018-12-21 20:46:12,185 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4045 HTTP/1.1" 200 6707 <Response [200]> >>> s.get('https://www.rfc-editor.org/info/rfc4046') 2018-12-21 20:46:12,667 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4046 HTTP/1.1" 200 6862 <Response [200]> >>> s.get('https://www.rfc-editor.org/info/rfc4047') 2018-12-21 20:46:13,837 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4047 HTTP/1.1" 200 6762 <Response [200]>，然后在3个唯一的主机组合之间循环，您会看到同一件事。（要注意的另一件事是，会话将以相同的方式保留并发送回cookie。）

现在输入pool_connections=2，该值将传递给pool_maxsize，并最终传递给urllib3.poolmanager.PoolManager。 maxsize的文档字符串为：

要重用的保存连接数。大于1是在多线程情况下很有用。 如果urllib3.connectionpool.HTTPSConnectionPool设置为False，将创建更多连接，但一次将不会保存他们已经被使用了。

偶然地，block是block=False的默认值，即使HTTPAdapter的默认值是True。这意味着HTTPConnectionPool对pool_maxsize几乎没有影响。

此外，HTTPAdapter是不是线程安全的；您不应在多个线程中使用相同的requests.Session()实例。（请参见here和here。）如果确实要这样做，更安全的方法是将每个线程借给自己的本地化会话实例，然后使用该会话通过多个URL发出请求。 threading.local()：

session

request.adapters.HTTPAdapter中pool_connections的含义是什么？

3 个答案:

请求＆＃39; secret：pool_connections和pool_maxsize

附录

摘要

详细信息