Question

我能以某种方式通过扩展来搜索S3中的对象，而不仅仅是前缀吗？

这就是我现在所拥有的：

ListObjectsResponse r = s3Client.ListObjects(new Amazon.S3.Model.ListObjectsRequest()
{
    BucketName = BucketName,
    Marker = marker,
    Prefix = folder, 
    MaxKeys = 1000
});

所以，我需要在我的桶中列出所有* .xls文件。

Answer 1

我不相信S3可以做到这一点。

最好的解决方案是使用数据库（Sql Server，MySql，SimpleDB等）对索引进行“索引”，并对此进行查询。

Answer 2

虽然我认为 BEST 的答案是使用数据库来跟踪您的文件，但我也认为这是一个令人难以置信的痛苦。我在python中使用boto3工作，这是我提出的解决方案。

它不优雅，但它会起作用。列出所有文件，然后将其过滤到带有＆＃34;后缀＆＃34; /＆＃34;扩展名＆＃34;的文件列表中。你想要的代码。

s3_client = boto3.client('s3')
bucket = 'my-bucket'
prefix = 'my-prefix/foo/bar'
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)

file_names = []

for response in response_iterator:
    for object_data in response['Contents']:
        key = object_data['Key']
        if key.endswith('.json'):
            file_names.append(key)

print file_names

Answer 3

您实际上并不需要单独的数据库来为您执行此操作。

S3使您能够列出具有特定前缀的存储桶中的对象。您的困境是“.xls”扩展名位于文件名的末尾，因此前缀搜索对您没有帮助。但是，将文件放入存储桶时，可以更改对象名称，以使前缀包含文件类型（例如：XLS-myfile.xls）。然后，您可以使用S3 API listObjects并传递前缀“XLS”。

Answer 4

我在获取文件信息后进行迭代。最终结果将在 dict

中

import boto3

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket_name')

#get all files information from buket
files = bucket.objects.all()

# create empty list for final information
files_information = []

# your known extensions list. we will compare file names with this list
extensions = ['png', 'jpg', 'txt', 'docx']

# Iterate throgh 'files', convert to dict. and add extension key.
for file in files:
    if file.key[-3:] in extensions:
        files_information.append({'file_name' : file.key, 'extension' : file.key[-3:]})
    else:
        files_information.append({'file_name' : file.key, 'extension' : 'unknown'})


print files_information

Answer 5

因为使用boto3资源从S3获取对象，所以使用返回的文件扩展名过滤所需内容可以获得满意的结果。像这样：

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket')
files = my_bucket.objects.all()
file_list = []
for file in files:
    if file.key.endswith('.docx'):
         file_list.append(file.key)

您可以使用所需的字符串更改endswith字符串。

Answer 6

您可以轻松地按扩展名列出所有元素，获取所有元素（包括文件夹），然后按key.endswith（'...'）进行过滤

import itertools
x = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
all_combinations = list(itertools.product(*x))
[(0, 3, 6),
 (0, 3, 7),
 (0, 3, 8),
 (0, 4, 6),
 (0, 4, 7),
 (0, 4, 8),
 (0, 5, 6),
 (0, 5, 7),
 (0, 5, 8),
 (1, 3, 6),
 (1, 3, 7),
 (1, 3, 8),
 (1, 4, 6),
 (1, 4, 7),
 (1, 4, 8),
 (1, 5, 6),
 (1, 5, 7),
 (1, 5, 8),
 (2, 3, 6),
 (2, 3, 7),
 (2, 3, 8),
 (2, 4, 6),
 (2, 4, 7),
 (2, 4, 8),
 (2, 5, 6),
 (2, 5, 7),
 (2, 5, 8)]

在这种情况下，我使用前缀（test_dir）过滤每个元素，然后仅显示扩展名为.zicu的元素

如何从s3 api按扩展名列出对象？

6 个答案: