我正在编写一个Python脚本,以从S3存储桶的文件夹中下载最新文件。我了解如何从S3存储桶下载最新的文件对象,但是我要下载的文件位于存储桶内的文件夹中。我完全不知道如何执行此操作以及将其添加到我的代码中的位置。我尝试将路径放在存储桶链接的末尾,但这似乎不起作用。
start
答案 0 :(得分:0)
信用在注释中,这只是对先前答案的小修改。
def download_latest_in_dir(prefix, local, bucket, client=boto3.client('s3'), nLatest=2):
"""
from https://stackoverflow.com/questions/31918960/boto3-to-download-all-files-from-a-s3-bucket/31929277
params:
- prefix: pattern to match in s3
- local: local path to folder in which to place files
- bucket: s3 bucket with target contents
- client: initialized s3 client object
- nLatest: number of the most recent files to fetch from aws
Example: download two latest files from aws directory ieee-temp/sst to local directory /home/hu-mka/Downloads/sst
download_latest_in_dir(prefix='sst', local='/home/hu-mka/Downloads', bucket='ieee-temp', client=boto3.client('s3'), nLatest=2)
"""
files = []
times = []
dirs = []
next_token = ''
base_kwargs = {
'Bucket':bucket,
'Prefix':prefix,
}
ipage = 0
while next_token is not None:
kwargs = base_kwargs.copy()
if next_token != '':
kwargs.update({'ContinuationToken': next_token})
results = client.list_objects_v2(**kwargs)
contents = results.get('Contents')
for i in contents:
k = i.get('Key')
if k[-1] != '/':
files.append(k)
t = i.get('LastModified')
times.append(t)
else:
print(f"Warning: there was a sub direcotory which we omit: {k}")
#dirs.append(k)
print(f"A page read {ipage}, last item: {files[-1]}, its time stamp:{times[-1]}")
next_token = results.get('NextContinuationToken')
ipage += 1
#if ipage > 2:
# break
# https://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list
time_sorted_filenames = [x for _, x in sorted(zip(times, files))]
#print(time_sorted_filenames)
for k in time_sorted_filenames[-nLatest:]:
dest_pathname = os.path.join(local, k)
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
client.download_file(bucket, k, dest_pathname)