将分段上传的最后一部分上传到S3时出现问题

时间:2018-10-25 16:11:24

标签: python-3.x amazon-s3 boto3

在将文件的最后一部分以分段上传的方式上传到S3(boto3,python3.6)时,我遇到了问题。 在我的代码下面:

mp_upload = s3_client.create_multipart_upload(Bucket=external_bucket, Key=audience_key)
mp_upload_id = mp_upload["UploadId"]
part_info = []
upload_content = []
byte_upload_size = 0
counter = 1
uploaded_once = False
FIVE_MEGABYTE = 5000000
for key in keys_to_aggregate:
        response = s3_client.get_object(Bucket=internal_bucket, Key=key)
        byte_file_size = response["ContentLength"]
        file_content = response["Body"].read().decode()

        byte_upload_size += byte_file_size
        upload_content.append(file_content)

        if byte_upload_size >= FIVE_MEGABYTE:
            # as soon as we reach the lower limit we upload
            logger.info(f"Uploading part {counter}")
            body = "".join(upload_content)
            body_with_header = f"{header}\n{body}".encode()
            part = s3_client.upload_part(Bucket=external_bucket,
                                         Key=audience_key,
                                         PartNumber=counter,
                                         UploadId=mp_upload_id,
                                         Body=body_with_header)

            part_info.append({"PartNumber": counter, "ETag": part["ETag"]})
            counter += 1
            # freeing up uploaded data
            byte_upload_size = 0
            upload_content = []
            uploaded_once = True

    if uploaded_once:
        # the last part can be less than 5MB so we need to upload it
        if byte_upload_size > 0:
            logger.info(f"Uploading last part for {job_id}")
            body = "".join(upload_content)
            body_with_header = f"{header}\n{body}".encode()
            part = s3_client.upload_part(Bucket=external_bucket,
                                         Key=audience_key,
                                         PartNumber=counter,
                                         UploadId=mp_upload_id,
                                         Body=body_with_header)

            part_info.append({"PartNumber": counter, "ETag": part["ETag"]})
            counter += 1

        s3_client.complete_multipart_upload(Bucket=external_bucket,
                                            Key=audience_key,
                                            UploadId=mp_upload_id,
                                            MultipartUpload={
                                                "Parts": part_info})
        logger.info(f"Multipart upload for {job_id} completed")
    else:
        # we didn't reach the 5MB threshold so no file was uploaded
        s3_client.abort_multipart_upload(Bucket=external_bucket,
                                         Key=audience_key,
                                         UploadId=mp_upload_id)

        # we proceed with a normal put
        body = "".join(upload_content)
        body_with_header = f"{header}\n{body}".encode()
        s3_client.put_object(Bucket=external_bucket, Key=audience_key,
                             Body=body_with_header)
        logger.info(f"Single file upload completed for {job_id}")

其中 keys_to_aggregate 是S3中的键列表。

问题出现在if if byte_uploaded_size> 0 内部,该命令检查要上传的最后一条数据。这段数据少于5MB,我的印象是您可以最后上传小于5MB的文件。

由于某种原因,boto3无法将最后一部分识别为最后一部分,并抛出:Error while aggregating data from S3: An error occurred (EntityTooSmall) when calling the CompleteMultipartUpload operation: Your proposed upload is smaller than the minimum allowed size

我找不到将最近一次上传标记为其他方式的方法。有人遇到过这个问题吗?

谢谢! 阿莱西奥

1 个答案:

答案 0 :(得分:2)

  

EntityTooSmall

     

您建议的上载小于允许的最小对象大小。除最后一部分外,每个部分的大小至少应为5 MB。

     

https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html

在两行之间阅读时,此错误与您的最后一部分无关,而与先前的一个或多个部分有关。

由此看来,最小部分大小实际上不是5 MB(5×1000×1000),而实际上是5 MiB(5×1024×1024)。