这个我正在使用的代码,无论如何都要让它运行得更快:
src_uri = boto.storage_uri(bucket, google_storage)
for obj in src_uri.get_bucket():
f.write('%s\n' % (obj.name))
答案 0 :(得分:2)
这是一个使用Google API Client Library for Python消费RESTful HTTP API直接使用底层Google云端存储API的示例。使用此方法,可以使用request batching检索单个HTTP请求中的所有对象的名称(从而减少额外的HTTP请求开销)以及使用objects.get操作的字段投影(通过设置&fields=name
)获取partial response,这样您就不会通过网络发送所有其他字段和数据(或等待在后端检索不必要的数据)。
此代码如下所示:
def get_credentials():
# Your code goes here... checkout the oauth2client documentation:
# http://google-api-python-client.googlecode.com/hg/docs/epy/oauth2client-module.html
# Or look at some of the existing samples for how to do this
def get_cloud_storage_service(credentials):
return discovery.build('storage', 'v1', credentials=credentials)
def get_objects(cloud_storage, bucket_name, autopaginate=False):
result = []
# Actually, it turns out that request batching isn't needed in this
# example, because the objects.list() operation returns not just
# the URL for the object, but also its name, as well. If it had returned
# just the URL, then that would be a case where we'd need such batching.
projection = 'nextPageToken,items(name,selfLink)'
request = cloud_storage.objects().list(bucket=bucket_name, fields=projection)
while request is not None:
response = request.execute()
result.extend(response.items)
if autopaginate:
request = cloud_storage.objects().list_next(request, response)
else:
request = None
return result
def main():
credentials = get_credentials()
cloud_storage = get_cloud_storage_service(credentials)
bucket = # ... your bucket name ...
for obj in get_objects(cloud_storage, bucket, autopaginate=True):
print 'name=%s, selfLink=%s' % (obj.name, obj.selfLink)
您可能会发现Google Cloud Storage Python Example和其他API Client Library Examples有助于了解如何执行此操作。 Google Developers channel上还有一些YouTube视频,例如Accessing Google APIs: Common code walkthrough,提供了演练。