Question

这个我正在使用的代码，无论如何都要让它运行得更快：

src_uri = boto.storage_uri(bucket, google_storage)
for obj in src_uri.get_bucket():
    f.write('%s\n' % (obj.name))

Answer 1

这是一个使用Google API Client Library for Python消费RESTful HTTP API直接使用底层Google云端存储API的示例。使用此方法，可以使用request batching检索单个HTTP请求中的所有对象的名称（从而减少额外的HTTP请求开销）以及使用objects.get操作的字段投影（通过设置&fields=name）获取partial response，这样您就不会通过网络发送所有其他字段和数据（或等待在后端检索不必要的数据）。

此代码如下所示：

def get_credentials():
   # Your code goes here... checkout the oauth2client documentation:
   # http://google-api-python-client.googlecode.com/hg/docs/epy/oauth2client-module.html
   # Or look at some of the existing samples for how to do this

def get_cloud_storage_service(credentials):
   return discovery.build('storage', 'v1', credentials=credentials)

def get_objects(cloud_storage, bucket_name, autopaginate=False):
   result = []
   # Actually, it turns out that request batching isn't needed in this
   # example, because the objects.list() operation returns not just
   # the URL for the object, but also its name, as well. If it had returned
   # just the URL, then that would be a case where we'd need such batching.
   projection = 'nextPageToken,items(name,selfLink)'
   request = cloud_storage.objects().list(bucket=bucket_name, fields=projection)
   while request is not None:
     response = request.execute()
     result.extend(response.items)
     if autopaginate:
        request = cloud_storage.objects().list_next(request, response)
     else:
        request = None
   return result

def main():
  credentials = get_credentials()
  cloud_storage = get_cloud_storage_service(credentials)
  bucket = # ... your bucket name ...
  for obj in get_objects(cloud_storage, bucket, autopaginate=True):
     print 'name=%s, selfLink=%s' % (obj.name, obj.selfLink)

您可能会发现Google Cloud Storage Python Example和其他API Client Library Examples有助于了解如何执行此操作。 Google Developers channel上还有一些YouTube视频，例如Accessing Google APIs: Common code walkthrough，提供了演练。

是否有任何等效的代码可以更快地从谷歌存储中获取存储桶

1 个答案: