分段上传到Amazon Glacier:内容范围与Content-Length不兼容

时间:2017-07-04 18:56:14

标签: python amazon-web-services boto3 amazon-glacier

我试图将大小为1GB的文件上传到Amazon Glacier。有点随意,我决定把它分成32mb部分并串行上传。

stopNetworkServer

这会产生关于第一个字节范围的错误。

import math
import boto3
from botocore.utils import calculate_tree_hash

client = boto3.client('glacier')
vault_name = 'my-vault'
size = 1073745600 # in bytes
size_mb = size / (2**20) # Convert to megabytes for readability
local_file = 'filename'

multi_up = client.initiate_multipart_upload(vaultName=vault_name,
                                        archiveDescription=local_file,
                                        partSize=str(2**25)) # 32 mb in bytes
parts = math.floor(size_mb / 32)
with open("/Users/alexchase/Desktop/{}".format(local_file), 'rb') as upload:
    for p in range(parts):
        # Calculate lower and upper bounds for the byte ranges. The last range
        # is bigger than the ones that come before.
        lower = (p * (2**25))
        upper = (((p + 1) * (2**25)) - 1) if (p + 1 < parts) else (size)
        up_part = client.upload_multipart_part(vaultName=vault_name,
                                           uploadId=multi_up['uploadId'],
                                           range='bytes {}-{}/*'.format(lower, upper),
                                           body=upload)
checksum = calculate_tree_hash(upload)
complete_up = client.complete_multipart_upload(archiveSize=str(size),
                                               checksum=checksum,
                                               uploadId=multi_up['uploadId'],
                                               vaultName=vault_name)

任何人都可以看到我做错了吗?

3 个答案:

答案 0 :(得分:3)

@ Michael-sqlbot是对的,Content-Range的问题在于我传递的是整个文件而不是部分文件。我通过使用read()方法解决了这个问题,但之后我发现了一个单独的问题,即(根据docs),最终部分必须与前面部分相同或更小。这意味着使用math.ceil()代替math.floor()来定义部件数量。

工作代码是:

import math
import boto3
from botocore.utils import calculate_tree_hash

client = boto3.client('glacier')
vault_name = 'my-vault'
size = 1073745600 # in bytes
size_mb = size / (2**20) # Convert to megabytes for readability
local_file = 'filename'
partSize=(2**25)

multi_up = client.initiate_multipart_upload(vaultName=vault_name,
                                        archiveDescription=local_file,
                                        partSize=str(partSize)) # 32 mb in bytes
parts = math.ceil(size_mb / 32) # The number of <=32mb parts we need
with open("/Users/alexchase/Desktop/{}".format(local_file), 'rb') as upload:
    for p in range(parts):
        # Calculate lower and upper bounds for the byte ranges. The last range
        # is now smaller than the ones that come before.
        lower = (p * (partSize))
        upper = (((p + 1) * (partSize)) - 1) if (p + 1 < parts) else (size-1)
        read_size = upper-lower+1
        file_part = upload.read(read_size)
        up_part = client.upload_multipart_part(vaultName=vault_name,
                                           uploadId=multi_up['uploadId'],
                                           range='bytes {}-{}/*'.format(lower, upper),
                                           body=file_part)
checksum = calculate_tree_hash(upload)
complete_up = client.complete_multipart_upload(archiveSize=str(size),
                                               checksum=checksum,
                                               uploadId=multi_up['uploadId'],
                                               vaultName=vault_name)

答案 1 :(得分:2)

Content-Range: bytes 0-33554431/* is incompatible with Content-Length: 1073745600

您告诉API您正在发送前32个MiB,但您实际上正在发送(建议发送)整个文件,因为body=uploadupload不仅仅是第一部分,它是整个文件。 Content-Length指的是此部件上传的大小,应为33554432(32 MiB)。

docs无疑是模棱两可的......

  

body(字节或可搜索文件类对象) - 要上传的数据。

...但“上传数据”似乎仅涉及此部分的数据,尽管“可搜索”一词。

答案 2 :(得分:0)

由于Alex的后续回答称它“可行”,因此我发布了另一个适用于我的Python 3.5和Ubuntu 16.04版本。我还在生产端到端解决方案中添加了一些环境变量。

原始帖子给了我一个错误,因此我对其进行了调整并提供了一些清理措施。希望这可以帮助需要此Glacier功能的人。将Shell脚本与awscli命令一起使用并不是那么干净。

import math
import boto3
import os
from botocore.utils import calculate_tree_hash

vault_name = os.getenv('GLACIER_VAULT_NAME')
file_name = os.getenv('GLACIER_UPLOAD_FILE')

if vault_name is None:
    print('GLACIER_VAULT_NAME environment variable is required. Exiting.')
    exit(1)
if file_name is None:
    print('GLACIER_UPLOAD_FILE environment variable is required. Exiting.')
    exit(2)

chunk_size = 2 ** 25
client = boto3.client('glacier')

client.create_vault(vaultName=vault_name)

upload_obj = client.initiate_multipart_upload(vaultName=vault_name,
                                              archiveDescription=file_name,
                                              partSize=str(chunk_size))
file_size = os.path.getsize(file_name)
parts = math.ceil(file_size / chunk_size)

with open(file_name, 'rb') as upload:
    for p in range(parts):
        lower = p * chunk_size
        upper = lower + chunk_size - 1

        if upper > file_size:
            upper = (file_size - lower) + lower - 1

        file_part = upload.read(chunk_size)

        up_part = client.upload_multipart_part(vaultName=vault_name,
                                               uploadId=upload_obj['uploadId'],
                                               range='bytes {}-{}/{}'.format(lower,
                                                                             upper,
                                                                             file_size),
                                               body=file_part)

# this needs a new file handler because calculate_tree_hash() processes 
# the handler in a similar way to the loop above
checksum = calculate_tree_hash(open(file_name, 'rb'))
complete_up = client.complete_multipart_upload(vaultName=vault_name,
                                               uploadId=upload_obj['uploadId'],
                                               archiveSize=str(file_size),
                                               checksum=checksum)

print(complete_up)