由于我需要将大量超过100000的文件上传到Azure Blob存储,因此我编写了一个程序,通过这样的多线程处理来上传。
from azure.storage.blob import BlobServiceClient, BlobClient
from itertools import repeat
from concurrent.futures import ThreadPoolExecutor
import os
def upload_single_blob(blob_service_client, blob_path):
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container='MyContainer',
blob=blob_path)
# Upload the file
with open(blob_path, "rb") as data:
blob_client.upload_blob(data)
# make blob service client from connect str
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# make file path list to upload
blob_path_list = os.listdir("./blob_files/")
blob_path_list = map(lambda x: "./blob_files/"+x, blob_path_list)
blob_path_list = list(blob_path_list)
# multi threading upload to blob
with ThreadPoolExecutor(max_workers=100) as executor:
executor.map(upload_single_blob, repeat(blob_service_client), blob_path_list)
但是,当我在azure VM(操作系统为ubuntu18.04)上运行该程序时,得到了很多警告。
urllib3.connectionpool WARNING --Connection pool is full, discarding connection: myblobaccount.blob.core.windows.net
我没有精确测量它,但是即使同时上传100个线程,似乎同时只有大约10个连接。
如何再增加连接数?