GCS - Python下载带有目录结构的blob

时间:2018-01-25 18:07:37

标签: python google-cloud-platform google-cloud-storage

我正在使用GCS python SDK和google API客户端的组合来循环启用版本的存储桶并根据元数据下载特定对象。

optionalClaims

上面的函数适用于不在目录(gs://bucketname/test1.txt)中的blob,因为传入的项目只是test1.txt。我遇到的问题是尝试从复杂目录树下载文件时(gs://bucketname/nfs/media/docs/test1.txt)传递的项目是nfs / media / docs / test1.txt。如果目录不存在,是否可以使用.download_to_file()方法创建目录?

2 个答案:

答案 0 :(得分:1)

GCS没有"目录的概念,"虽然像gsutil这样的工具可以很好地假装方便。如果你想要" nfs / media / docs /"下的所有对象?在路径中,您可以将其指定为前缀,如下所示:

request = service.objects.list(
    bucket=bucket_name,
    versions=True,
    prefix='nfs/media/docs/',  # Only show objects beginning like this
    delimiter='/'  # Consider this character a directory marker.
)
response = request.execute()
subdirectories = response['prefixes']
objects = response['items']

由于prefix参数,只有以' nfs / media / docs'将在response['items']中返回。由于delimiter参数,"子目录"将在response['prefixes']中返回。您可以在Python documentation of the objects.list method中获得更多详细信息。

如果您使用我推荐用于新代码的较新版google-cloud Python library,则同一个调用看起来会pretty similar

from google.cloud import storage

client = storage.Client()
bucket = client.bucket(bucket_name)
iterator = bucket.list_blobs(
    versions=True,
    prefix='nfs/media/docs/',
    delimiter='/'
)
subdirectories = iterator.prefixes
objects = list(iterator)

答案 1 :(得分:1)

以下是工作解决方案。我最终从对象名称中删除了路径并动态创建了目录结构。一个更好的方法可能是@Brandon Yarbrough建议使用'prefix + response ['前缀'] [0]',但我无法理解这一点。希望这有助于其他人。

#!/usr/local/bin/python3

from google.cloud import storage
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json
import os
import pathlib

bucket_name = 'test-bucket'
restore_epoch = '1519189202'
restore_location = '/Users/admin/data/'

credentials = GoogleCredentials.get_application_default()
service = discovery.build('storage', 'v1', credentials=credentials)

storage_client = storage.Client()
source_bucket = storage_client.get_bucket(bucket_name)


def listall_objects():
    request = service.objects().list(
        bucket=bucket_name,
        versions=True
    )
    response = request.execute()
    print(json.dumps(response, indent=2))


def listname_objects():
    request = service.objects().list(
        bucket=bucket_name,
        versions=True
    )
    response = request.execute()

    for item in response['items']:
        print(item['name'] + ' Uploaded on: ' + item['updated'] +
              ' Epoch: ' + item['metadata']['epoch'])


def downloadepoch_objects():
    request = service.objects().list(
        bucket=bucket_name,
        versions=True
    )
    response = request.execute()

    try:
        for item in response['items']:
            if item['metadata']['epoch'] == restore_epoch:
                print('Downloading ' + item['name'] + ' from ' +
                      item['bucket'] + '; Epoch= ' + item['metadata']['epoch'])
                print('Saving to: ' + restore_location)
                blob = source_bucket.blob(item['name'])
                path = pathlib.Path(restore_location + r'{}'.format(item['name'])).parent
                if os.path.isdir(path):
                    blob.download_to_filename(restore_location + '{}'.format(item['name']))
                    print('Download complete')
                else:
                    os.mkdir(path)
                    blob.download_to_filename(restore_location + '{}'.format(item['name']))
                    print('Download complete')
    except Exception:
        pass


# listall_objects()
# listname_objects()
downloadepoch_objects()