我有一个具有以下结构的Amazon S3存储桶:
bucket_name/level1/level2/level3/level4/..../somefilename1.txt, somefilename2.txt,... somefilename(n).txt
多个文件可以位于根“文件夹”下。
我只需要获取级别1和级别2的“文件夹”名称列表。我不需要钻研第2级。换句话说,我只需要获取以下列表:bucket_name/level1/level2/
个名称。该列表可能超过2000个项目。
如果我使用:
s3_keys = s3_client.list_objects(Bucket=bucket, Prefix=prefix, Delimiter='/')
我成功获取了要查找的列表,但仅限于1000条记录。
我用谷歌搜索和分页器似乎是一个选择:
keys = []
paginator = s3_client.get_paginator('list_objects')
operation_parameters = {'Bucket': bucket,
'Prefix': filepath}
page_iterator = paginator.paginate(**operation_parameters)
for page in page_iterator:
keys.append(page['Contents'])
但是这种分页器方法是返回存储桶下的每个对象路径....这可能是成千上万个对象路径。
我只需要顶部2级路径
请告知如何完成此任务。谢谢。
示例目录结构:
my_bucket/machine1_id/part1_id/../../../..
my_bucket/machine1_id/part2_id/../../../..
.
.
my_bucket/machineN_id/part1_id/../../../..
my_bucket/machineN_id/part2_id/../../../..
.
.
my_bucket/machineN_id/part(n)_id/../../../..
.
.
my_bucket/Building1_id/Room1_size/.../../../..
my_bucket/Building1_id/Room2_size/.../../../..
.
.
my_bucket/BuildingN_id/Room1_size/.../../../..
my_bucket/BuildingN_id/Room2_size/.../../../..
.
.
my_bucket/BuildingN_id/RoomN_size/.../../../..
.
.
,依此类推。我只想获取所有my_bucket / 1st_level / 2n_level /的列表,仅此而已。就我而言,可以超过2000个项目
我正在寻找的返回的String列表就是这样
[
"my_bucket/machine1_id/part1_id/",
"my_bucket/machine1_id/part2_id/",
.
.
"my_bucket/machineN_id/part1_id/",
"my_bucket/machineN_id/part2_id/",
.
.
"my_bucket/machineN_id/part(n)_id/",
.
.
"my_bucket/Building1_id/Room1_size/",
"my_bucket/Building1_id/Room2_size/",
.
.
"my_bucket/BuildingN_id/Room1_size/",
"my_bucket/BuildingN_id/Room2_size/",
.
.
"my_bucket/BuildingN_id/RoomN_size/",
.
.
]
答案 0 :(得分:0)
如果要列出level1/level2
内的所有对象,可以使用:
import boto3
s3_client = boto3.client('s3')
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(
Bucket='bucket-name',
Delimiter='/',
Prefix='level1/level2/',
)
for page in response_iterator:
for object in page['Contents']:
print(object['Key'])
答案 1 :(得分:0)
无法使用本地boto3选项实现此目的。
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
for obj in bucket.objects.all():
if obj.key.endswith('/'):
print(obj.key)
这将打印所有文件夹(实际上每个键都以/结尾)。