将文件上载到Azure blob存储需要更多时间来处理更大的文件

时间:2016-12-09 06:59:57

标签: azure azure-storage azure-storage-blobs

  

大家好......

     

我正在尝试上传更大的文件(大小超过100 MB)文件   到Azure blob storage.Below是代码。

     

我的问题是即使我已经将BeginPutBlock与TPL一起使用(任务   并行性)它花费更多时间(100 MB上传20分钟)。但   我必须上传超过2 GB大小的文件。谁能请帮忙   我就这个。

namespace BlobSamples {
    public class UploadAsync
    {
        static void Main(string[] args)
        {
            //string filePath = @"D:\Frameworks\DNCMag-Issue26-DoubleSpread.pdf";
            string filePath = @"E:\E Books\NageswaraRao Meterial\ebooks\applied_asp.net_4_in_context.pdf";
            string accountName = "{account name}";
            string accountKey = "{account key}";
            string containerName = "sampleContainer";
            string blobName = Path.GetFileName(filePath);
            //byte[] fileContent = File.ReadAllBytes(filePath);
            Stream fileContent = System.IO.File.OpenRead(filePath);

            StorageCredentials creds = new StorageCredentials(accountName, accountKey);
            CloudStorageAccount storageAccount = new CloudStorageAccount(creds, useHttps: true);
            CloudBlobClient blobclient = storageAccount.CreateCloudBlobClient();
            CloudBlobContainer container = blobclient.GetContainerReference(containerName);
            CloudBlockBlob blob = container.GetBlockBlobReference(blobName);

            // Define your retry strategy: retry 5 times, starting 1 second apart
            // and adding 2 seconds to the interval each retry.
            var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1),
              TimeSpan.FromSeconds(2));

            // Define your retry policy using the retry strategy and the Azure storage
            // transient fault detection strategy.
            var retryPolicy =
              new RetryPolicy<StorageTransientErrorDetectionStrategy>(retryStrategy);

            // Receive notifications about retries.
            retryPolicy.Retrying += (sender, arg) =>
                {
                    // Log details of the retry.
                    var msg = String.Format("Retry - Count:{0}, Delay:{1}, Exception:{2}",
                        arg.CurrentRetryCount, arg.Delay, arg.LastException);
                };

            Console.WriteLine("Upload Started" + DateTime.Now);
            ChunkedUploadStreamAsync(blob, fileContent, (1024*1024), retryPolicy);
            Console.WriteLine("Upload Ended" + DateTime.Now);
            Console.ReadLine();
        }

        private static Task PutBlockAsync(CloudBlockBlob blob, string id, Stream stream, RetryPolicy policy)
        {
            Func<Task> uploadTaskFunc = () => Task.Factory
                .FromAsync(
                    (asyncCallback, state) => blob.BeginPutBlock(id, stream, null, null, null, null, asyncCallback, state)
                    , blob.EndPutBlock
                    , null
                );
            Console.WriteLine("Uploaded " + id + DateTime.Now);
            return policy.ExecuteAsync(uploadTaskFunc);
        }

        public static Task ChunkedUploadStreamAsync(CloudBlockBlob blob, Stream source, int chunkSize, RetryPolicy policy)
        {
            var blockids = new List<string>();
            var blockid = 0;

            int count;

            // first create a list of TPL Tasks for uploading blocks asynchronously
            var tasks = new List<Task>();

            var bytes = new byte[chunkSize];
            while ((count = source.Read(bytes, 0, bytes.Length)) != 0)
            {
                var id = Convert.ToBase64String(BitConverter.GetBytes(++blockid));
                blockids.Add(id);
                tasks.Add(PutBlockAsync(blob, id, new MemoryStream(bytes, true), policy));
                bytes = new byte[chunkSize]; //need a new buffer to avoid overriding previous one
            }

            return Task.Factory.ContinueWhenAll(
                tasks.ToArray(),
                array =>
                {
                    // propagate exceptions and make all faulted Tasks as observed
                    Task.WaitAll(array);
                    policy.ExecuteAction(() => blob.PutBlockListAsync(blockids));
                    Console.WriteLine("Uploaded Completed " + DateTime.Now);
                });
        }
    } }

3 个答案:

答案 0 :(得分:0)

如果您可以接受命令行工具,则可以尝试AzCopy,它能够以高性能传输Azure存储数据,并且可以恢复其传输。

如果您想以编程方式控制转移作业,请使用Azure Storage Data Movement Library,这是AzCopy的核心。

答案 1 :(得分:0)

众所周知,Block blob是由块组成的。块大小可达4MB。根据您的代码,您将块大小设置为1MB并以编程方式并行上传每个块。简单来说,您可以利用属性ParallelOperationThreadCount并行上传blob块,如下所示:

//set the number of blocks that may be simultaneously uploaded
var requestOption = new BlobRequestOptions()
{
    ParallelOperationThreadCount = 5,
    //Gets or sets the maximum size of a blob in bytes that may be uploaded as a single blob
    SingleBlobUploadThresholdInBytes = 10 * 1024 * 1024 //maximum for 64MB,32MB by default
};

//upload a file to blob
blob.UploadFromFile("{filepath}", options: requestOption);

根据该选项,当您的blob(文件)大于SingleBlobUploadThresholdInBytes中的值时,存储客户端会自动将文件分成块(大小为4MB)并同时上传块。

根据您的要求,我创建了一个ASP.NET Web API应用程序,该应用程序公开了一个API以将文件上载到Azure Blob存储。

项目网址:AspDotNet-WebApi-AzureBlobFileUploadSample

注意:

要上传大文件,您需要增加web.config中的maxRequestLengthmaxAllowedContentLength,如下所示:

<system.web>
   <httpRuntime maxRequestLength="2097152"/>  <!--KB in size, 4MB by default, increase it to 2GB-->
</system.web>
<system.webServer> 
      <security> 
          <requestFiltering> 
             <requestLimits maxAllowedContentLength="2147483648" />  <!--Byte in size,increase it to 2GB-->
          </requestFiltering> 
      </security> 
</system.webServer>

<强>截图

答案 2 :(得分:-1)

我建议你在上传大文件时使用Azcopy,这样可以节省大量的编码时间,并且效率更高。要上传单个文件,请运行以下命令:

AzCopy /Source:C:\folder /Dest:https://youraccount.blob.core.windows.net/container /DestKey:key /Pattern:"test.txt"