Question

我正在使用boto和Python来存储和检索Amazon S3之间的文件。我需要获取directory中存在的文件列表。我知道S3中没有目录的概念所以我正在处理我的问题，如how can I get a list of all file names having same prefix?

例如 - 假设我有以下文件 -

Brad/files/pdf/abc.pdf
Brad/files/pdf/abc2.pdf
Brad/files/pdf/abc3.pdf
Brad/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/

当我致电foo("Brad")时，它应该返回一个这样的列表 -

files/pdf/abc.pdf
files/pdf/abc2.pdf
files/pdf/abc3.pdf
files/pdf/abc4.pdf

最好的方法是什么？

Answer 1

user3的方法是纯粹的客户端解决方案。我认为它在小规模上运作良好。如果您在一个存储桶中有数百万个对象，则可能需要支付许多请求和带宽费用。

或者，您可以使用GET BUCKET API提供的分隔符和前缀参数来存档您的需求。文档中有许多示例，请参阅http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

毋庸置疑，您可以使用boto来实现这一目标。

Answer 2

您可以为此目的使用startswith和list comprehension，如下所示：

paths=['Brad/files/pdf/abc.pdf','Brad/files/pdf/abc2.pdf','Brad/files/pdf/abc3.pdf','Brad/files/pdf/abc4.pdf','mybucket/files/pdf/new/','mybucket/files/pdf/new/abc.pdf','mybucket/files/pdf/2011/']
def foo(m):
   return [p for p in paths if p.startswith(m+'/')]

print foo('Brad')

输出：

['Brad/files/pdf/abc.pdf', 'Brad/files/pdf/abc2.pdf', 'Brad/files/pdf/abc3.pdf', 'Brad/files/pdf/abc4.pdf']

使用拆分和filter：

 def foo(m):
    return filter(lambda x: x.split('/')[0]== m, paths)

如何获取Amazon S3上具有相同前缀的所有文件名列表？

2 个答案: