我需要将大文件(50 GB)从一个S3存储桶复制到另一个S3存储桶。 能否请你就此提出建议。
我需要在AWS中创建AWS lambda函数。
提前致谢!
答案 0 :(得分:1)
我写了这样的话:
int threads = 12;
long fileSize = 50 * 1024 * 1024 * 1024; // use exact file size here
InitiateMultipartUploadRequest multipartRequest = new InitiateMultipartUploadRequest()
{
BucketName = "destBucket",
Key = "destKey"
};
InitiateMultipartUploadResponse multipartResponse = client.InitiateMultipartUpload(multipartRequest);
long minPartSize = 5 * 1024 * 1024; // 5 MiB minimum except for last chunk
long maxPartSize = 5 * 1024 * 1024 * 1024.0; // 5 GiB
long partSize = (long)(fileSize / (double)maxPartSize); // amazon's max chunk size is 5 GiB.
partSize = Math.Max(minPartSize , Math.Min(maxPartSize , partSize));
int parts = (int)Math.Ceiling(fileSize / (double)partSize);
CopyPartResponse[] partsUploaded = new CopyPartResponse[parts];
Parallel.For(0, parts, new ParallelOptions() { MaxDegreeOfParallelism = threads }, (i) =>
{
long position = i * partSize;
long lastPosition = Math.Min(fileSize - 1, (position + partSize - 1));
var copyPartRequest = new CopyPartRequest()
{
DestinationBucket = multipartRequest.BucketName,
DestinationKey = multipartRequest.Key,
SourceBucket = "sourceBucket",
SourceKey = "sourceKey",
UploadId = multipartResponse.UploadId,
FirstByte = position,
LastByte = lastPosition,
PartNumber = i + 1
};
partsUploaded [i] = client.CopyPart(copyPartRequest);
});
CompleteMultipartUploadRequest completeRequest = new CompleteMultipartUploadRequest()
{
BucketName = multipartRequest.BucketName,
Key = multipartRequest.Key,
UploadId = multipartResponse.UploadId
};
completeRequest.AddPartETags(partsUploaded );
CompleteMultipartUploadResponse completeResponse = client.CompleteMultipartUpload(completeRequest);
需要一个大文件(例如50 GiB),然后根据亚马逊的最大值和最小值计算要使用的零件尺寸。 接下来,它使用多达12个线程执行并行(线程)for循环,以使用S3的CopyPart功能复制各个部分S3-> S3。 最后,它“完成”了多部分文件。
注意:不完整的多部分文件将计入您的存储桶使用情况。您可以添加存储桶生命周期策略以在给定时间后删除此类文件,也可以使用S3 CLI来发现它们。
答案 1 :(得分:0)
boto3 Amazon S3 copy()
command可以复制大文件:
将对象从一个S3位置复制到另一个位置。
这是一个托管转移,如有必要,它将在多个线程中执行多部分复制。
let trigger = UNTimeIntervalNotificationTrigger(timeInterval: 60, repeats: true)
答案 2 :(得分:-1)
我已经完成了这个问题。我发布了我的解决方案,以便它可以帮助您。
使用System;
使用System.Collections.Generic;
使用Amazon.S3;
使用Amazon.S3.Model;
namespace s3.amazon.com.docsamples { class CopyObjectUsingMPUapi {
static string sourceBucket = "*** Source bucket name ***";
static string targetBucket = "*** Target bucket name ***";
static string sourceObjectKey = "*** Source object key ***";
static string targetObjectKey = "*** Target object key ***";
static void Main(string[] args)
{
IAmazonS3 s3Client = new AmazonS3Client(Amazon.RegionEndpoint.USEast1);
// List to store upload part responses.
List<UploadPartResponse> uploadResponses = new List<UploadPartResponse>();
List<CopyPartResponse> copyResponses = new List<CopyPartResponse>();
InitiateMultipartUploadRequest initiateRequest =
new InitiateMultipartUploadRequest
{
BucketName = targetBucket,
Key = targetObjectKey
};
InitiateMultipartUploadResponse initResponse =
s3Client.InitiateMultipartUpload(initiateRequest);
String uploadId = initResponse.UploadId;
try
{
// Get object size.
GetObjectMetadataRequest metadataRequest = new GetObjectMetadataRequest
{
BucketName = sourceBucket,
Key = sourceObjectKey
};
GetObjectMetadataResponse metadataResponse =
s3Client.GetObjectMetadata(metadataRequest);
long objectSize = metadataResponse.ContentLength; // in bytes
// Copy parts.
long partSize = 5 * (long)Math.Pow(2, 20); // 5 MB
long bytePosition = 0;
for (int i = 1; bytePosition < objectSize; i++)
{
CopyPartRequest copyRequest = new CopyPartRequest
{
DestinationBucket = targetBucket,
DestinationKey = targetObjectKey,
SourceBucket = sourceBucket,
SourceKey = sourceObjectKey,
UploadId = uploadId,
FirstByte = bytePosition,
LastByte = bytePosition + partSize - 1 >= objectSize ? objectSize - 1 : bytePosition + partSize - 1,
PartNumber = i
};
copyResponses.Add(s3Client.CopyPart(copyRequest));
bytePosition += partSize;
}
CompleteMultipartUploadRequest completeRequest =
new CompleteMultipartUploadRequest
{
BucketName = targetBucket,
Key = targetObjectKey,
UploadId = initResponse.UploadId
};
completeRequest.AddPartETags(copyResponses);
CompleteMultipartUploadResponse completeUploadResponse = s3Client.CompleteMultipartUpload(completeRequest);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
// Helper function that constructs ETags.
static List<PartETag> GetETags(List<CopyPartResponse> responses)
{
List<PartETag> etags = new List<PartETag>();
foreach (CopyPartResponse response in responses)
{
etags.Add(new PartETag(response.PartNumber, response.ETag));
}
return etags;
}
}
}