我正在编写一个脚本来解析S3存储桶文件,而无需在本地下载它们。似乎该代码在找不到冰川文件的范围内仍然有效。我现在要添加一个例外(我保证,错误处理在实际代码中看起来更好),但是理想情况下,我想看看是否有可能过滤掉冰川文件。
这是我的代码:
import boto3
import gzip
import os
try:
s3_client = boto3.client('s3')
bucket = 'my_bucket'
prefix = 'path_to_file/file_name.csv.gz'
obj = s3_client.get_object(Bucket=bucket, Key=prefix)
body = obj['Body']
with gzip.open(body, 'rt') as gf:
for ln in gf:
print(ln)
except Exception as e:
print(e)
我看到使用AWS CLI,至少可以像冰川文件在底部那样对文件进行排序,因此必须有一种在boto3中对它们进行排序或过滤的方法:
aws s3api list-objects --bucket my-bucket --query "reverse(sort_by(Contents,&LastModified))"
答案 0 :(得分:0)
使用StorageClass =='STANDARD'(vs =='GLACIER')解决:
bucket = 'my_bucket'
prefix = 'path/to/files/'
s3_client = boto3.client('s3')
response = s3_client.list_objects(Bucket=bucket, Prefix=prefix)
for file in response['Contents']:
if file['StorageClass'] == 'STANDARD':
name = file['Key'].rsplit('/', 1)
if name[1] != '':
file_name = name[1]
obj = s3_client.get_object(Bucket=bucket, Key=prefix + file_name)
body = obj['Body']
lns = []
i = 0
with gzip.open(body, 'rt') as gf:
for ln in gf:
i += 1
lns.append(ln.rstrip())
if i == 10:
break