我们正在运行以下代码以并行上传到GCP存储桶。看来我们正在根据所看到的警告迅速耗尽池中的所有连接。有什么方法可以配置库正在使用的连接池?
def upload_string_to_bucket(content: str):
blob = bucket.blob(cloud_path)
blob.upload_from_string(content)
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(upload_string_to_bucket, content_list)
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
答案 0 :(得分:0)
我在并行下载Blob时也遇到类似的问题。
本文可能会提供参考。 https://laike9m.com/blog/requests-secret-pool_connections-and-pool_maxsize,89/
我个人认为增加连接拉力不是最佳解决方案, 我更喜欢通过pool_maxsize对“下载”进行分组。
def chunker(it: Iterable, chunk_size: int):
chunk = []
for index, item in enumerate(it):
chunk.append(item)
if not (index + 1) % chunk_size:
yield chunk
chunk = []
if chunk:
yield chunk
for chunk in chunker(content_list, 10):
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(upload_string_to_bucket, chunk)
当然,我们可以在准备好后立即产生下载内容,