我正在尝试使用boto从S3下载文件,但前提是该文件的本地副本比远程文件旧。
我正在使用标题'If-Modified-Since'和下面的代码:
#!/usr/bin/python
import os
import datetime
import boto
from boto.s3.key import Key
bucket_name = 'my-bucket'
conn = boto.connect_s3()
bucket = conn.get_bucket(bucket_name)
def download(bucket, filename):
key = Key(bucket, filename)
headers = {}
if os.path.isfile(filename):
print "File exists, adding If-Modified-Since header"
modified_since = os.path.getmtime(filename)
timestamp = datetime.datetime.utcfromtimestamp(modified_since)
headers['If-Modified-Since'] = timestamp.strftime("%a, %d %b %Y %H:%M:%S GMT")
try:
key.get_contents_to_filename(filename, headers)
except boto.exception.S3ResponseError as e:
return 304
return 200
print download(bucket, 'README')
问题是当本地文件不存在时,一切正常并且文件被下载。当我第二次运行脚本时,我的函数按预期返回304,但先前下载的文件将被删除。
答案 0 :(得分:8)
boto.s3.key.Key.get_contents_to_filename
以wb
模式打开文件;它会在函数开头(boto/s3/key.py)截断文件。除此之外,它还会在引发异常时删除文件。
您可以使用get_contents_to_filename
使用不同的开放模式,而不是get_contents_to_file
。
def download(bucket, filename):
key = Key(bucket, filename)
headers = {}
mode = 'wb'
updating = False
if os.path.isfile(filename):
mode = 'r+b'
updating = True
print "File exists, adding If-Modified-Since header"
modified_since = os.path.getmtime(filename)
timestamp = datetime.datetime.utcfromtimestamp(modified_since)
headers['If-Modified-Since'] = timestamp.strftime("%a, %d %b %Y %H:%M:%S GMT")
try:
with open(filename, mode) as f:
key.get_contents_to_file(f, headers)
f.truncate()
except boto.exception.S3ResponseError as e:
if not updating:
# got an error and we are not updating an existing file
# delete the file that was created due to mode = 'wb'
os.remove(filename)
return e.status
return 200
注意 file.truncate
用于处理新文件小于上一个文件的情况。