并行天蓝色blob上传得到警告“ urllib3.connectionpool警告-连接池已满,正在丢弃连接”

时间:2020-09-18 15:49:25

标签: python multithreading azure azure-blob-storage urllib3

由于我需要将大量超过100000的文件上传到Azure Blob存储,因此我编写了一个程序,通过这样的多线程处理来上传。

from azure.storage.blob import BlobServiceClient, BlobClient
from itertools import repeat
from concurrent.futures import ThreadPoolExecutor
import os

def upload_single_blob(blob_service_client, blob_path):
    # Create a blob client using the local file name as the name for the blob
    blob_client = blob_service_client.get_blob_client(container='MyContainer', 
    blob=blob_path)

    # Upload the file
    with open(blob_path, "rb") as data:
        blob_client.upload_blob(data)

# make blob service client from connect str
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# make file path list to upload
blob_path_list = os.listdir("./blob_files/")
blob_path_list = map(lambda x: "./blob_files/"+x, blob_path_list)
blob_path_list = list(blob_path_list)

# multi threading upload to blob
with ThreadPoolExecutor(max_workers=100) as executor:
    executor.map(upload_single_blob, repeat(blob_service_client), blob_path_list)

但是,当我在azure VM(操作系统为ubuntu18.04)上运行该程序时,得到了很多警告。

urllib3.connectionpool WARNING --Connection pool is full, discarding connection: myblobaccount.blob.core.windows.net

我没有精确测量它,但是即使同时上传100个线程,似乎同时只有大约10个连接。

如何再增加连接数?

0 个答案:

没有答案