我试图将大小为1GB的文件上传到Amazon Glacier。有点随意,我决定把它分成32mb部分并串行上传。
stopNetworkServer
这会产生关于第一个字节范围的错误。
import math
import boto3
from botocore.utils import calculate_tree_hash
client = boto3.client('glacier')
vault_name = 'my-vault'
size = 1073745600 # in bytes
size_mb = size / (2**20) # Convert to megabytes for readability
local_file = 'filename'
multi_up = client.initiate_multipart_upload(vaultName=vault_name,
archiveDescription=local_file,
partSize=str(2**25)) # 32 mb in bytes
parts = math.floor(size_mb / 32)
with open("/Users/alexchase/Desktop/{}".format(local_file), 'rb') as upload:
for p in range(parts):
# Calculate lower and upper bounds for the byte ranges. The last range
# is bigger than the ones that come before.
lower = (p * (2**25))
upper = (((p + 1) * (2**25)) - 1) if (p + 1 < parts) else (size)
up_part = client.upload_multipart_part(vaultName=vault_name,
uploadId=multi_up['uploadId'],
range='bytes {}-{}/*'.format(lower, upper),
body=upload)
checksum = calculate_tree_hash(upload)
complete_up = client.complete_multipart_upload(archiveSize=str(size),
checksum=checksum,
uploadId=multi_up['uploadId'],
vaultName=vault_name)
任何人都可以看到我做错了吗?
答案 0 :(得分:3)
@ Michael-sqlbot是对的,Content-Range
的问题在于我传递的是整个文件而不是部分文件。我通过使用read()
方法解决了这个问题,但之后我发现了一个单独的问题,即(根据docs),最终部分必须与前面部分相同或更小。这意味着使用math.ceil()
代替math.floor()
来定义部件数量。
工作代码是:
import math
import boto3
from botocore.utils import calculate_tree_hash
client = boto3.client('glacier')
vault_name = 'my-vault'
size = 1073745600 # in bytes
size_mb = size / (2**20) # Convert to megabytes for readability
local_file = 'filename'
partSize=(2**25)
multi_up = client.initiate_multipart_upload(vaultName=vault_name,
archiveDescription=local_file,
partSize=str(partSize)) # 32 mb in bytes
parts = math.ceil(size_mb / 32) # The number of <=32mb parts we need
with open("/Users/alexchase/Desktop/{}".format(local_file), 'rb') as upload:
for p in range(parts):
# Calculate lower and upper bounds for the byte ranges. The last range
# is now smaller than the ones that come before.
lower = (p * (partSize))
upper = (((p + 1) * (partSize)) - 1) if (p + 1 < parts) else (size-1)
read_size = upper-lower+1
file_part = upload.read(read_size)
up_part = client.upload_multipart_part(vaultName=vault_name,
uploadId=multi_up['uploadId'],
range='bytes {}-{}/*'.format(lower, upper),
body=file_part)
checksum = calculate_tree_hash(upload)
complete_up = client.complete_multipart_upload(archiveSize=str(size),
checksum=checksum,
uploadId=multi_up['uploadId'],
vaultName=vault_name)
答案 1 :(得分:2)
Content-Range: bytes 0-33554431/* is incompatible with Content-Length: 1073745600
您告诉API您正在发送前32个MiB,但您实际上正在发送(建议发送)整个文件,因为body=upload
和upload
不仅仅是第一部分,它是整个文件。 Content-Length
指的是此部件上传的大小,应为33554432(32 MiB)。
docs无疑是模棱两可的......
body
(字节或可搜索文件类对象) - 要上传的数据。
...但“上传数据”似乎仅涉及此部分的数据,尽管“可搜索”一词。
答案 2 :(得分:0)
由于Alex的后续回答称它“可行”,因此我发布了另一个适用于我的Python 3.5和Ubuntu 16.04版本。我还在生产端到端解决方案中添加了一些环境变量。
原始帖子给了我一个错误,因此我对其进行了调整并提供了一些清理措施。希望这可以帮助需要此Glacier功能的人。将Shell脚本与awscli命令一起使用并不是那么干净。
import math
import boto3
import os
from botocore.utils import calculate_tree_hash
vault_name = os.getenv('GLACIER_VAULT_NAME')
file_name = os.getenv('GLACIER_UPLOAD_FILE')
if vault_name is None:
print('GLACIER_VAULT_NAME environment variable is required. Exiting.')
exit(1)
if file_name is None:
print('GLACIER_UPLOAD_FILE environment variable is required. Exiting.')
exit(2)
chunk_size = 2 ** 25
client = boto3.client('glacier')
client.create_vault(vaultName=vault_name)
upload_obj = client.initiate_multipart_upload(vaultName=vault_name,
archiveDescription=file_name,
partSize=str(chunk_size))
file_size = os.path.getsize(file_name)
parts = math.ceil(file_size / chunk_size)
with open(file_name, 'rb') as upload:
for p in range(parts):
lower = p * chunk_size
upper = lower + chunk_size - 1
if upper > file_size:
upper = (file_size - lower) + lower - 1
file_part = upload.read(chunk_size)
up_part = client.upload_multipart_part(vaultName=vault_name,
uploadId=upload_obj['uploadId'],
range='bytes {}-{}/{}'.format(lower,
upper,
file_size),
body=file_part)
# this needs a new file handler because calculate_tree_hash() processes
# the handler in a similar way to the loop above
checksum = calculate_tree_hash(open(file_name, 'rb'))
complete_up = client.complete_multipart_upload(vaultName=vault_name,
uploadId=upload_obj['uploadId'],
archiveSize=str(file_size),
checksum=checksum)
print(complete_up)