在将文件的最后一部分以分段上传的方式上传到S3(boto3,python3.6)时,我遇到了问题。 在我的代码下面:
mp_upload = s3_client.create_multipart_upload(Bucket=external_bucket, Key=audience_key)
mp_upload_id = mp_upload["UploadId"]
part_info = []
upload_content = []
byte_upload_size = 0
counter = 1
uploaded_once = False
FIVE_MEGABYTE = 5000000
for key in keys_to_aggregate:
response = s3_client.get_object(Bucket=internal_bucket, Key=key)
byte_file_size = response["ContentLength"]
file_content = response["Body"].read().decode()
byte_upload_size += byte_file_size
upload_content.append(file_content)
if byte_upload_size >= FIVE_MEGABYTE:
# as soon as we reach the lower limit we upload
logger.info(f"Uploading part {counter}")
body = "".join(upload_content)
body_with_header = f"{header}\n{body}".encode()
part = s3_client.upload_part(Bucket=external_bucket,
Key=audience_key,
PartNumber=counter,
UploadId=mp_upload_id,
Body=body_with_header)
part_info.append({"PartNumber": counter, "ETag": part["ETag"]})
counter += 1
# freeing up uploaded data
byte_upload_size = 0
upload_content = []
uploaded_once = True
if uploaded_once:
# the last part can be less than 5MB so we need to upload it
if byte_upload_size > 0:
logger.info(f"Uploading last part for {job_id}")
body = "".join(upload_content)
body_with_header = f"{header}\n{body}".encode()
part = s3_client.upload_part(Bucket=external_bucket,
Key=audience_key,
PartNumber=counter,
UploadId=mp_upload_id,
Body=body_with_header)
part_info.append({"PartNumber": counter, "ETag": part["ETag"]})
counter += 1
s3_client.complete_multipart_upload(Bucket=external_bucket,
Key=audience_key,
UploadId=mp_upload_id,
MultipartUpload={
"Parts": part_info})
logger.info(f"Multipart upload for {job_id} completed")
else:
# we didn't reach the 5MB threshold so no file was uploaded
s3_client.abort_multipart_upload(Bucket=external_bucket,
Key=audience_key,
UploadId=mp_upload_id)
# we proceed with a normal put
body = "".join(upload_content)
body_with_header = f"{header}\n{body}".encode()
s3_client.put_object(Bucket=external_bucket, Key=audience_key,
Body=body_with_header)
logger.info(f"Single file upload completed for {job_id}")
其中 keys_to_aggregate 是S3中的键列表。
问题出现在if if byte_uploaded_size> 0 内部,该命令检查要上传的最后一条数据。这段数据少于5MB,我的印象是您可以最后上传小于5MB的文件。
由于某种原因,boto3无法将最后一部分识别为最后一部分,并抛出:Error while aggregating data from S3: An error occurred (EntityTooSmall) when calling the CompleteMultipartUpload operation: Your proposed upload is smaller than the minimum allowed size
。
我找不到将最近一次上传标记为其他方式的方法。有人遇到过这个问题吗?
谢谢! 阿莱西奥
答案 0 :(得分:2)
EntityTooSmall
您建议的上载小于允许的最小对象大小。除最后一部分外,每个部分的大小至少应为5 MB。
https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html
在两行之间阅读时,此错误与您的最后一部分无关,而与先前的一个或多个部分有关。
由此看来,最小部分大小实际上不是5 MB(5×1000×1000),而实际上是5 MiB(5×1024×1024)。