我正在使用boto和python以及amazon s3。
如果我使用
[key.name for key in list(self.bucket.list())]
然后我得到所有文件的所有密钥。
mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/
的最佳方式是什么?
1. either get all folders from s3
2. or from that list just remove the file from the last and get the unique keys of folders
我想这样做
set([re.sub("/[^/]*$","/",path) for path in mylist]
答案 0 :(得分:40)
建立在sethwm的答案上:
获取顶级目录:
list(bucket.list("", "/"))
获取files
:
list(bucket.list("files/", "/")
等等。
答案 1 :(得分:16)
正如j1m建议的评论方法之一所指出的那样,返回一个前缀对象。如果您在名称/路径之后,可以使用变量名称。例如:
import boto
import boto.s3
conn = boto.s3.connect_to_region('us-west-2')
bucket = conn.get_bucket(your_bucket)
folders = bucket.list("","/")
for folder in folders:
print folder.name
答案 2 :(得分:13)
由于我不了解python或boto,这将是一个不完整的答案,但我想对问题中的基本概念发表评论。
其他一张海报是对的:S3中没有目录的概念。只有平键/值对。许多应用程序假装某些分隔符表示目录条目。例如" /"或" \"。有些应用程序就像放置一个虚拟文件一样,如果"目录"清空后,你仍然可以在列表结果中看到它。
您不必总是拉下整个存储桶并在本地进行过滤。 S3有一个分隔列表的概念,你可以在其中具体说明你认为你的路径分隔符(" /"," \"," |",& #34; foobar"等)S3会将虚拟结果返回给你,类似于你想要的。
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html( 查看分隔符标题。)
此API将为您提供一个级别的目录。所以,如果你的例子中有:
mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/
你传了一个带有前缀""的列表。和分隔符" /",你得到的结果:
mybucket/files/
如果您传入了带有前缀" mybucket / files /"的LIST。和分隔符" /",你得到的结果:
mybucket/files/pdf/
如果你传入了带有前缀" mybucket / files / pdf /"的LIST。和分隔符" /",你得到的结果:
mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/2011/
如果您想从结果集中删除pdf文件本身,那么您自己就可以独立。
现在你如何在python / boto中这样做我不知道。希望有一种方法可以通过。
答案 3 :(得分:7)
基本上S3中没有文件夹这样的东西。在内部,所有内容都存储为密钥,如果密钥名称中包含斜杠字符,客户端可能会决定将其显示为文件夹。
考虑到这一点,您应首先获取所有密钥,然后使用正则表达式过滤掉包含斜杠的路径。你现在的解决方案已经是一个良好的开端。
答案 4 :(得分:4)
I see you have successfully made the boto connection. If you only have one directory that you are interested in (like you provided in the example), I think what you can do is use prefix and delimiter that's already provided via AWS (Link).
Boto uses this feature in its bucket object, and you can retrieve a hierarchical directory information using prefix and delimiter. The bucket.list() will return a boto.s3.bucketlistresultset.BucketListResultSet
object.
I tried this a couple ways, and if you do choose to use a delimiter=
argument in bucket.list()
, the returned object is an iterator for boto.s3.prefix.Prefix
, rather than boto.s3.key.Key
. In other words, if you try to retrieve the subdirectories you should put delimiter='\'
and as a result, you will get an iterator for the prefix
object
Both returned objects (either prefix or key object) have a .name
attribute, so if you want the directory/file information as a string, you can do so by printing like below:
from boto.s3.connection import S3Connection
key_id = '...'
secret_key = '...'
# Create connection
conn = S3Connection(key_id, secret_key)
# Get list of all buckets
allbuckets = conn.get_all_buckets()
for bucket_name in allbuckets:
print(bucket_name)
# Connet to a specific bucket
bucket = conn.get_bucket('bucket_name')
# Get subdirectory info
for key in bucket.list(prefix='sub_directory/', delimiter='/'):
print(key.name)
答案 5 :(得分:2)
boto界面允许您列出存储桶的内容并提供该条目的前缀。 这样你就可以获得普通文件系统中目录的条目:
import boto
AWS_ACCESS_KEY_ID = '...'
AWS_SECRET_ACCESS_KEY = '...'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket()
bucket_entries = bucket.list(prefix='/path/to/your/directory')
for entry in bucket_entries:
print entry
答案 6 :(得分:2)
正如其他人所说,这里的问题是文件夹不一定有密钥,所以你必须在字符串中搜索/字符并通过它找出你的文件夹。这是生成模仿文件夹结构的递归字典的一种方法。
如果您想要文件夹中的所有文件及其网址
assets = {}
for key in self.bucket.list(str(self.org) + '/'):
path = key.name.split('/')
identifier = assets
for uri in path[1:-1]:
try:
identifier[uri]
except:
identifier[uri] = {}
identifier = identifier[uri]
if not key.name.endswith('/'):
identifier[path[-1]] = key.generate_url(expires_in=0, query_auth=False)
return assets
如果您只想要空文件夹
folders = {}
for key in self.bucket.list(str(self.org) + '/'):
path = key.name.split('/')
identifier = folders
for uri in path[1:-1]:
try:
identifier[uri]
except:
identifier[uri] = {}
identifier = identifier[uri]
if key.name.endswith('/'):
identifier[path[-1]] = {}
return folders
然后可以稍后递归读出。
答案 7 :(得分:1)
我发现以下使用boto3可以工作:
def list_folders(s3_client, bucket_name):
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix='', Delimiter='/')
for content in response.get('CommonPrefixes', []):
yield content.get('Prefix')
s3_client = session.client('s3')
folder_list = list_folders(s3_client, bucket_name)
for folder in folder_list:
print('Folder found: %s' % folder)
引用: