使用c#将大文件从一个S3存储桶复制到另一个S3存储桶

时间:2017-05-31 01:44:04

标签: c# amazon-s3 aws-lambda

我需要将大文件(50 GB)从一个S3存储桶复制到另一个S3存储桶。 能否请你就此提出建议。

我需要在AWS中创建AWS lambda函数。

提前致谢!

3 个答案:

答案 0 :(得分:1)

我写了这样的话:

int threads = 12;
long fileSize = 50 * 1024 * 1024 * 1024; // use exact file size here
InitiateMultipartUploadRequest multipartRequest = new InitiateMultipartUploadRequest()
{
    BucketName = "destBucket",
    Key = "destKey"
};
InitiateMultipartUploadResponse multipartResponse = client.InitiateMultipartUpload(multipartRequest);

long minPartSize = 5 * 1024 * 1024; // 5 MiB minimum except for last chunk
long maxPartSize = 5 * 1024 * 1024 * 1024.0; // 5 GiB

long partSize = (long)(fileSize / (double)maxPartSize); // amazon's max chunk size is 5 GiB.
partSize = Math.Max(minPartSize , Math.Min(maxPartSize , partSize));
int parts = (int)Math.Ceiling(fileSize / (double)partSize);
CopyPartResponse[] partsUploaded = new CopyPartResponse[parts];

Parallel.For(0, parts, new ParallelOptions() { MaxDegreeOfParallelism = threads }, (i) =>
{
    long position = i * partSize;
    long lastPosition = Math.Min(fileSize - 1, (position + partSize - 1));
    var copyPartRequest = new CopyPartRequest()
    {
        DestinationBucket = multipartRequest.BucketName,
        DestinationKey = multipartRequest.Key,
        SourceBucket = "sourceBucket",
        SourceKey = "sourceKey",
        UploadId = multipartResponse.UploadId,
        FirstByte = position,
        LastByte = lastPosition,
        PartNumber = i + 1
    };
    partsUploaded [i] = client.CopyPart(copyPartRequest);
});

CompleteMultipartUploadRequest completeRequest = new CompleteMultipartUploadRequest()
{
    BucketName = multipartRequest.BucketName,
    Key = multipartRequest.Key,
    UploadId = multipartResponse.UploadId
};
completeRequest.AddPartETags(partsUploaded );
CompleteMultipartUploadResponse completeResponse = client.CompleteMultipartUpload(completeRequest);

需要一个大文件(例如50 GiB),然后根据亚马逊的最大值和最小值计算要使用的零件尺寸。 接下来,它使用多达12个线程执行并行(线程)for循环,以使用S3的CopyPart功能复制各个部分S3-> S3。 最后,它“完成”了多部分文件。

注意:不完整的多部分文件将计入您的存储桶使用情况。您可以添加存储桶生命周期策略以在给定时间后删除此类文件,也可以使用S3 CLI来发现它们。

答案 1 :(得分:0)

boto3 Amazon S3 copy() command可以复制大文件:

  

将对象从一个S3位置复制到另一个位置。

     

这是一个托管转移,如有必要,它将在多个线程中执行多部分复制。

let trigger = UNTimeIntervalNotificationTrigger(timeInterval: 60, repeats: true)

答案 2 :(得分:-1)

我已经完成了这个问题。我发布了我的解决方案,以便它可以帮助您。

使用System;

使用System.Collections.Generic;

使用Amazon.S3;

使用Amazon.S3.Model;

namespace s3.amazon.com.docsamples {     class CopyObjectUsingMPUapi     {

    static string sourceBucket    = "*** Source bucket name ***";
    static string targetBucket    = "*** Target bucket name ***";
    static string sourceObjectKey = "*** Source object key ***";
    static string targetObjectKey = "*** Target object key ***";

    static void Main(string[] args)
    {
        IAmazonS3 s3Client = new AmazonS3Client(Amazon.RegionEndpoint.USEast1);

        // List to store upload part responses.
        List<UploadPartResponse> uploadResponses = new List<UploadPartResponse>();

        List<CopyPartResponse> copyResponses = new List<CopyPartResponse>();
        InitiateMultipartUploadRequest initiateRequest =
               new InitiateMultipartUploadRequest
                   {
                       BucketName = targetBucket,
                       Key = targetObjectKey
                   };

        InitiateMultipartUploadResponse initResponse =
            s3Client.InitiateMultipartUpload(initiateRequest);
        String uploadId = initResponse.UploadId;

        try
        {
            // Get object size.
            GetObjectMetadataRequest metadataRequest = new GetObjectMetadataRequest
                {
                     BucketName = sourceBucket,
                     Key        = sourceObjectKey
                };

            GetObjectMetadataResponse metadataResponse = 
                         s3Client.GetObjectMetadata(metadataRequest);
            long objectSize = metadataResponse.ContentLength; // in bytes

            // Copy parts.
            long partSize = 5 * (long)Math.Pow(2, 20); // 5 MB

            long bytePosition = 0;
            for (int i = 1; bytePosition < objectSize; i++)
            {

                CopyPartRequest copyRequest = new CopyPartRequest
                    {
                        DestinationBucket = targetBucket,
                        DestinationKey = targetObjectKey,
                        SourceBucket = sourceBucket,
                        SourceKey = sourceObjectKey,
                        UploadId = uploadId,
                        FirstByte = bytePosition,
                        LastByte = bytePosition + partSize - 1 >= objectSize ? objectSize - 1 : bytePosition + partSize - 1,
                        PartNumber = i
                    };

                copyResponses.Add(s3Client.CopyPart(copyRequest));

                bytePosition += partSize;
            }
            CompleteMultipartUploadRequest completeRequest =
                  new CompleteMultipartUploadRequest
                      {
                          BucketName = targetBucket,
                          Key = targetObjectKey,
                          UploadId = initResponse.UploadId
                      };

            completeRequest.AddPartETags(copyResponses);
            CompleteMultipartUploadResponse completeUploadResponse = s3Client.CompleteMultipartUpload(completeRequest);

        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }
    }

    // Helper function that constructs ETags.
    static List<PartETag> GetETags(List<CopyPartResponse> responses)
    {
        List<PartETag> etags = new List<PartETag>();
        foreach (CopyPartResponse response in responses)
        {
            etags.Add(new PartETag(response.PartNumber, response.ETag));
        }
        return etags;
    }
}

}