我在Windows机器上的本地文件夹中有一些图像。我想将所有图像上传到同一容器中的同一Blob。
我知道如何使用Azure Storage SDKs library(partykit)
library(aVirtualTwins)
data(sepsis)
attach(sepsis)
data <- cbind(y = survival, trt = as.factor(THERAPY), sepsis[,3:13])
formula <- as.formula(paste("y ~ trt", paste(names(sepsis[,3:13]),
collapse = " + "), sep = " | "))
fit <- glmtree(formula, data, family = binomial)
plot(fit)
detach(sepsis)
上传单个文件,但是我看不到一次可以上传文件夹中所有图像的可能性。
但是,Azure Storage Explorer为此提供了功能,因此必须可以通过某种方式实现。
是否有提供此服务的功能,或者我是否必须遍历文件夹中的所有文件并为同一Blob多次运行BlockBlobService.create_blob_from_path()
?
答案 0 :(得分:4)
没有直接的方法可以做到这一点。您可以通过Azure存储python SDK blockblobservice.py和baseblobservice.py浏览详细信息。
正如您提到的,您应该遍历它。示例代码如下:
from azure.storage.blob import BlockBlobService, PublicAccess
import os
def run_sample():
block_blob_service = BlockBlobService(account_name='your_account', account_key='your_key')
container_name ='t1s'
local_path = "D:\\Test\\test"
for files in os.listdir(local_path):
block_blob_service.create_blob_from_path(container_name,files,os.path.join(local_path,files))
# Main method.
if __name__ == '__main__':
run_sample()
答案 1 :(得分:0)
您可以通过探索多线程来获得更好的上传性能。这里有一些代码可以做到这一点:
from azure.storage.blob import BlobClient
from threading import Thread
import os
# Uploads a single blob. May be invoked in thread.
def upload_blob(container, file, index=0, result=None):
if result is None:
result = [None]
try:
# extract blob name from file path
blob_name = ''.join(os.path.splitext(os.path.basename(file)))
blob = BlobClient.from_connection_string(
conn_str='CONNECTION STRING',
container_name=container,
blob_name=blob_name
)
with open(file, "rb") as data:
blob.upload_blob(data, overwrite=True)
print(f'Upload succeeded: {blob_name}')
result[index] = True # example of returning result
except Exception as e:
print(e) # do something useful here
result[index] = False # example of returning result
# container: string of container name. This example assumes the container exists.
# files: list of file paths.
def upload_wrapper(container, files):
# here, you can define a better threading/batching strategy than what is written
# this code just creates a new thread for each file to be uploaded
parallel_runs = len(files)
threads = [None] * parallel_runs
results = [None] * parallel_runs
for i in range(parallel_runs):
t = Thread(target=upload_blob, args=(container, files[i], i, results))
threads[i] = t
threads[i].start()
for i in range(parallel_runs): # wait for all threads to finish
threads[i].join()
# do something with results here
可能有更好的分块策略 - 这只是一个示例,用于说明在某些情况下,您可以通过使用线程来实现更高的 blob 上传性能。
以下是顺序循环方法与上述线程方法(482 个图像文件,总共 26 MB)之间的一些基准:
我还要补充一点,您可以考虑通过 Python 调用 azcopy,因为此工具可能更适合您的特定需求。