我设法编写了一个python脚本来列出容器中的所有blob。
import azure
from azure.storage.blob import BlobService
from azure.storage import *
blob_service = BlobService(account_name='<CONTAINER>', account_key='<ACCOUNT_KEY>')
blobs = []
marker = None
while True:
batch = blob_service.list_blobs('<CONAINER>', marker=marker)
blobs.extend(batch)
if not batch.next_marker:
break
marker = batch.next_marker
for blob in blobs:
print(blob.name)
就像我说的那样,它只列出了我要下载的blob。我已经移动到Azure CLI上,看看这是否有助于我想做什么。我可以用
下载单个blobazure storage blob download [container]
然后它会提示我指定一个我可以从python脚本中获取的blob。我必须下载所有这些blob的方法是在上面使用的命令后将它们复制并粘贴到提示符中。有没有办法可以:
A 即可。编写一个bash脚本,通过执行命令迭代blob列表,然后在提示符中粘贴下一个blob名称。
乙即可。指定在python脚本或Azure CLI中下载容器。我有没有看到下载整个容器的东西?
答案 0 :(得分:2)
@ gary-liu-msft解决方案是正确的。我对其进行了一些更改,现在代码可以遍历容器及其中的文件夹结构(PS - 容器中没有文件夹,只是路径),检查客户端中是否存在相同的目录结构,如果不存在则创建该目录结构并下载这些路径中的blob。它支持嵌入子目录的长路径。
from azure.storage.blob import BlockBlobService
from azure.storage.blob import PublicAccess
import os
#name of your storage account and the access key from Settings->AccessKeys->key1
block_blob_service = BlockBlobService(account_name='storageaccountname', account_key='accountkey')
#name of the container
generator = block_blob_service.list_blobs('testcontainer')
#code below lists all the blobs in the container and downloads them one after another
for blob in generator:
print(blob.name)
print("{}".format(blob.name))
#check if the path contains a folder structure, create the folder structure
if "/" in "{}".format(blob.name):
print("there is a path in this")
#extract the folder path and check if that folder exists locally, and if not create it
head, tail = os.path.split("{}".format(blob.name))
print(head)
print(tail)
if (os.path.isdir(os.getcwd()+ "/" + head)):
#download the files to this directory
print("directory and sub directories exist")
block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
else:
#create the diretcory and download the file to it
print("directory doesn't exist, creating it now")
os.makedirs(os.getcwd()+ "/" + head, exist_ok=True)
print("directory created, download initiated")
block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
else:
block_blob_service.get_blob_to_path('testcontainer',blob.name,blob.name)
这里也提供相同的代码https://gist.github.com/brijrajsingh/35cd591c2ca90916b27742d52a3cf6ba
答案 1 :(得分:1)
目前,似乎我们无法使用单个API直接从容器中下载所有blob。我们可以在https://msdn.microsoft.com/en-us/library/azure/dd179377.aspx处获得blob的所有可用操作。
因此我们可以列出ListGenerator
blob,然后在循环中下载blob。 E.G:
result = blob_service.list_blobs(container)
for b in result.items:
r = blob_service.get_blob_to_path(container,b.name,"folder/{}".format(b.name))
使用azure-storage-python时导入blockblob服务:
from azure.storage.blob import BlockBlobService
答案 2 :(得分:1)
由于@ brij-raj-singh-msft回答,Microsoft发布了Gen2版本的适用于Python的Azure Storage Blobs客户端库。 (以下代码已在12.5.0版中进行了测试) 此代码段已在2020年9月25日进行了测试
import os
from azure.storage.blob import BlobServiceClient,ContainerClient, BlobClient
import datetime
# Assuming your Azure connection string environment variable set.
# If not, create BlobServiceClient using trl & credentials.
#Example: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobserviceclient
connection_string = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
blob_service_client = BlobServiceClient.from_connection_string(conn_str=connection_string)
# create container client
container_name = 'test2'
container_client = blob_service_client.get_container_client(container_name)
#Check if there is a top level local folder exist for container.
#If not, create one
data_dir ='Z:/azure_storage'
data_dir = data_dir+ "/" + container_name
if not(os.path.isdir(data_dir)):
print("[{}]:[INFO] : Creating local directory for container".format(datetime.datetime.utcnow()))
os.makedirs(data_dir, exist_ok=True)
#code below lists all the blobs in the container and downloads them one after another
blob_list = container_client.list_blobs()
for blob in blob_list:
print("[{}]:[INFO] : Blob name: {}".format(datetime.datetime.utcnow(), blob.name))
#check if the path contains a folder structure, create the folder structure
if "/" in "{}".format(blob.name):
#extract the folder path and check if that folder exists locally, and if not create it
head, tail = os.path.split("{}".format(blob.name))
if not (os.path.isdir(data_dir+ "/" + head)):
#create the diretcory and download the file to it
print("[{}]:[INFO] : {} directory doesn't exist, creating it now".format(datetime.datetime.utcnow(),data_dir+ "/" + head))
os.makedirs(data_dir+ "/" + head, exist_ok=True)
# Finally, download the blob
blob_client = container_client.get_blob_client(blob.name)
dowlload_blob(blob_client,data_dir+ "/"+blob.name)
def dowlload_blob(blob_client, destination_file):
print("[{}]:[INFO] : Downloading {} ...".format(datetime.datetime.utcnow(),destination_file))
with open(destination_file, "wb") as my_blob:
blob_data = blob_client.download_blob()
blob_data.readinto(my_blob)
print("[{}]:[INFO] : download finished".format(datetime.datetime.utcnow()))
此处也提供相同的代码https://gist.github.com/allene/6bbb36ec3ed08b419206156567290b13
答案 3 :(得分:0)
我为Azure CLI制作了Python wrapper,使我们能够批量下载/上传。这样,我们可以从容器中下载完整的容器或某些文件。
要安装:
pip install azurebatchload
import os
from azurebatchload.download import DownloadBatch
az_batch = DownloadBatch(
destination='../pdfs',
source='blobcontainername',
pattern='*.pdf'
)
az_batch.download()