我想根据文件的上次修改日期从特定的s3存储桶下载文件。
我已经研究了如何连接boto3,并且有很多代码和文档可用于无条件下载文件。我做了一个伪代码
def download_file_s3(bucket_name,modified_date)
# connect to reseource s3
s3 = boto3.resource('s3',aws_access_key_id='demo', aws_secret_access_key='demo')
# connect to the desired bucket
my_bucket = s3.Bucket(bucket_name)
# Get files
for file in my_bucket.objects.all():
我想完成此功能,基本上,传递一个修改日期,该函数将返回该特定修改日期s3存储桶中的文件。
答案 0 :(得分:1)
这是我的测试代码,它将打印具有我设置的日期时间之后的对象的last_modified datetime。
import boto3
from datetime import datetime
from datetime import timezone
s3 = boto3.resource('s3')
response = s3.Bucket('<bucket name>').objects.all()
for item in response:
obj = s3.Object(item.bucket_name, item.key)
if obj.last_modified > datetime(2019, 8, 1, 0, 0, 0, tzinfo=timezone.utc):
print(obj.last_modified)
如果您有特定的日期,那么
import boto3
from datetime import datetime, timezone
s3 = boto3.resource('s3')
response = s3.Bucket('<bucket name>').objects.all()
date = '20190827' # input('Insert Date as a form YYYYmmdd')
for item in response:
obj = s3.Object(item.bucket_name, item.key)
if obj.last_modified.strftime('%Y%m%d') == date:
print(obj.last_modified)
将给出如下结果。
2019-08-27 07:13:04+00:00
2019-08-27 07:13:36+00:00
2019-08-27 07:13:39+00:00
答案 1 :(得分:1)
我有一个更好的解决方案或可以自动执行此功能的功能。只需输入存储桶名称和下载路径名称即可。
from boto3.session import Session
from datetime import date, timedelta
import boto3
import re
def Download_pdf_specifc_date_subfolder(bucket_name,download_path)
ACCESS_KEY = 'XYZ'
SECRET_KEY = 'ABC'
Bucket_name=bucket_name
# code to create a session
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
bucket = s3.Bucket(Bucket_name)
# code to get the yesterdays date
yesterday = date.today() - timedelta(days=1)
x=yesterday.strftime('20%y-%m-%d')
print(x)
#code to add the files to a list which needs to be downloaded
files_to_downloaded = []
#code to take all the files from s3 under a specific bucket
for fileObject in bucket.objects.all():
file_name = str(fileObject.key)
last_modified=str(fileObject.last_modified)
last_modified=last_modified.split()
if last_modified[0]==x:
# Enter the specific bucketname in the regex in place of Airports to filter only the particluar subfolder
if re.findall(r"Airports/[a-zA-Z]+", file_name):
files_to_downloaded.append(file_name)
# code to Download into a specific Folder
for fileObject in bucket.objects.all():
file_name = str(fileObject.key)
if file_name in files_to_downloaded:
print(file_name)
d_path=download_path + file_name
print(d_path)
bucket.download_file(file_name,d_path)
Download_pdf_specifc_date_subfolder(bucket_name,download_path)
最终,该功能将在特定文件夹中提供结果以及要下载的文件。