下载Azure存储容器中的所有blob

时间:2016-05-06 19:21:36

标签: python bash azure containers

我设法编写了一个python脚本来列出容器中的所有blob。

import azure
from azure.storage.blob import BlobService
from azure.storage import *

blob_service = BlobService(account_name='<CONTAINER>', account_key='<ACCOUNT_KEY>')


blobs = []
marker = None
while True:
    batch = blob_service.list_blobs('<CONAINER>', marker=marker)
    blobs.extend(batch)
    if not batch.next_marker:
        break
    marker = batch.next_marker
for blob in blobs:
    print(blob.name)

就像我说的那样,它只列出了我要下载的blob。我已经移动到Azure CLI上,看看这是否有助于我想做什么。我可以用

下载单个blob
azure storage blob download [container]
然后它会提示我指定一个我可以从python脚本中获取的blob。我必须下载所有这些blob的方法是在上面使用的命令后将它们复制并粘贴到提示符中。有没有办法可以:

A 即可。编写一个bash脚本,通过执行命令迭代blob列表,然后在提示符中粘贴下一个blob名称。

即可。指定在python脚本或Azure CLI中下载容器。我有没有看到下载整个容器的东西?

4 个答案:

答案 0 :(得分:2)

@ gary-liu-msft解决方案是正确的。我对其进行了一些更改,现在代码可以遍历容器及其中的文件夹结构(PS - 容器中没有文件夹,只是路径),检查客户端中是否存在相同的目录结构,如果不存在则创建该目录结构并下载这些路径中的blob。它支持嵌入子目录的长路径。

from azure.storage.blob import BlockBlobService
from azure.storage.blob import PublicAccess
import os

#name of your storage account and the access key from Settings->AccessKeys->key1
block_blob_service = BlockBlobService(account_name='storageaccountname', account_key='accountkey')

#name of the container
generator = block_blob_service.list_blobs('testcontainer')

#code below lists all the blobs in the container and downloads them one after another
for blob in generator:
    print(blob.name)
    print("{}".format(blob.name))
    #check if the path contains a folder structure, create the folder structure
    if "/" in "{}".format(blob.name):
        print("there is a path in this")
        #extract the folder path and check if that folder exists locally, and if not create it
        head, tail = os.path.split("{}".format(blob.name))
        print(head)
        print(tail)
        if (os.path.isdir(os.getcwd()+ "/" + head)):
            #download the files to this directory
            print("directory and sub directories exist")
            block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
        else:
            #create the diretcory and download the file to it
            print("directory doesn't exist, creating it now")
            os.makedirs(os.getcwd()+ "/" + head, exist_ok=True)
            print("directory created, download initiated")
            block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
    else:
        block_blob_service.get_blob_to_path('testcontainer',blob.name,blob.name)

这里也提供相同的代码https://gist.github.com/brijrajsingh/35cd591c2ca90916b27742d52a3cf6ba

答案 1 :(得分:1)

目前,似乎我们无法使用单个API直接从容器中下载所有blob。我们可以在https://msdn.microsoft.com/en-us/library/azure/dd179377.aspx处获得blob的所有可用操作。

因此我们可以列出ListGenerator blob,然后在循环中下载blob。 E.G:

result = blob_service.list_blobs(container)
for b in result.items:
    r = blob_service.get_blob_to_path(container,b.name,"folder/{}".format(b.name))

更新

使用azure-storage-python时导入blockblob服务:

from azure.storage.blob import BlockBlobService

答案 2 :(得分:1)

由于@ brij-raj-singh-msft回答,Microsoft发布了Gen2版本的适用于Python的Azure Storage Blobs客户端库。 (以下代码已在12.5.0版中进行了测试) 此代码段已在2020年9月25日进行了测试

import os
from azure.storage.blob import BlobServiceClient,ContainerClient, BlobClient
import datetime

# Assuming your Azure connection string environment variable set.
# If not, create BlobServiceClient using trl & credentials.
#Example: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobserviceclient 

connection_string = os.getenv("AZURE_STORAGE_CONNECTION_STRING")

blob_service_client = BlobServiceClient.from_connection_string(conn_str=connection_string) 
# create container client
container_name = 'test2'
container_client = blob_service_client.get_container_client(container_name)

#Check if there is a top level local folder exist for container.
#If not, create one
data_dir ='Z:/azure_storage'
data_dir = data_dir+ "/" + container_name
if not(os.path.isdir(data_dir)):
    print("[{}]:[INFO] : Creating local directory for container".format(datetime.datetime.utcnow()))
    os.makedirs(data_dir, exist_ok=True)
    
#code below lists all the blobs in the container and downloads them one after another
blob_list = container_client.list_blobs()
for blob in blob_list:
    print("[{}]:[INFO] : Blob name: {}".format(datetime.datetime.utcnow(), blob.name))
    #check if the path contains a folder structure, create the folder structure
    if "/" in "{}".format(blob.name):
        #extract the folder path and check if that folder exists locally, and if not create it
        head, tail = os.path.split("{}".format(blob.name))
        if not (os.path.isdir(data_dir+ "/" + head)):
            #create the diretcory and download the file to it
            print("[{}]:[INFO] : {} directory doesn't exist, creating it now".format(datetime.datetime.utcnow(),data_dir+ "/" + head))
            os.makedirs(data_dir+ "/" + head, exist_ok=True)
    # Finally, download the blob
    blob_client = container_client.get_blob_client(blob.name)
    dowlload_blob(blob_client,data_dir+ "/"+blob.name)

def dowlload_blob(blob_client, destination_file):
    print("[{}]:[INFO] : Downloading {} ...".format(datetime.datetime.utcnow(),destination_file))
    with open(destination_file, "wb") as my_blob:
        blob_data = blob_client.download_blob()
        blob_data.readinto(my_blob)
    print("[{}]:[INFO] : download finished".format(datetime.datetime.utcnow()))    

此处也提供相同的代码https://gist.github.com/allene/6bbb36ec3ed08b419206156567290b13

答案 3 :(得分:0)

我为Azure CLI制作了Python wrapper,使我们能够批量下载/上传。这样,我们可以从容器中下载完整的容器或某些文件。

要安装:

pip install azurebatchload
import os
from azurebatchload.download import DownloadBatch

az_batch = DownloadBatch(
    destination='../pdfs',
    source='blobcontainername',
    pattern='*.pdf'
)
az_batch.download()