为azure DownloadToStreamAsync接收的并行任务设置限制

时间:2014-02-03 11:01:01

标签: c# azure task-parallel-library

我需要从windows azure存储下载一堆文件(大约10k)。为了让它们并行下载而不是一次下载我正在使用blob DownloadToStreamAsync方法返回一个Task对象。然后,我使用一种将流保存到文件的方法设置任务ContinueWith。

以下是代码:

foreach (var File in ServerFiles)
{
    string sFileName = File.Uri.LocalPath.ToString();
    CloudBlockBlob oBlob = BiActionscontainer.GetBlockBlobReference(sFileName.Replace("/" + Container + "/", ""));

    MemoryStream ms = new MemoryStream();
    BlobRequestOptions f = new BlobRequestOptions();
    Task downloadTask = oBlob.DownloadToStreamAsync(ms);

    downloadTask.ContinueWith((Task task) =>
    {
         ms.Position = 0;
         lock(lockObject)
         {
              using (FileStream file = new FileStream(ResultPath, FileMode.Append, FileAccess.Write))
              {
                   byte[] bytes = ms.ToArray();
                   file.Write(bytes, 0, bytes.Length);
              }
         }
         ms.Dispose();
    });
}

此代码是在我们的某个服务器(不是azure)上运行的工具的一部分 - Windows 2003服务器。问题是在那台服务器上我得到“操作已经超时.Microsoft.WindowsAzure.Storage on windows 2003 standard”,所以我想可能是很多文件同时发出请求并阻塞带宽

所以我想知道,在我从第三方库中获取Task对象的情况下,如何限制一次运行的并行数量?还在排队剩下的任务吗?

1 个答案:

答案 0 :(得分:2)

您可以使用SemaphoreSlim。使用您想要拥有的并发请求数设置它,然后在每个请求完成之前使用await WaitAsync(),在每个请求完成后Release(),最后等待剩余的任务。

封装在辅助方法中,它看起来像这样:

public static async Task ForEachAsync<T>(
    this IEnumerable<T> items, Func<T, Task> action, int maxDegreeOfParallelism)
{
    var semaphore = new SemaphoreSlim(maxDegreeOfParallelism);

    var tasks = new List<Task>();

    foreach (var item in items)
    {
        await semaphore.WaitAsync();

        Func<T, Task> loopAction = async x =>
        {
            await action(x);
            semaphore.Release();
        };

        tasks.Add(loopAction(item));
    }

    await Task.WhenAll(tasks);
}

用法(对代码进行一些更改,主要是为了简化代码并使其更加异步):

ServerFiles.ForEachAsync(async file =>
{
    string sFileName = File.Uri.LocalPath.ToString();
    CloudBlockBlob oBlob = BiActionscontainer.GetBlockBlobReference(sFileName.Replace("/" + Container + "/", ""));

    var ms = new MemoryStream();
    BlobRequestOptions f = new BlobRequestOptions();
    await oBlob.DownloadToStreamAsync(ms);

    ms.Position = 0;
    lock (lockObject)
    {
         using (var file = new FileStream(ResultPath, FileMode.Append, FileAccess.Write))
         {
              await ms.CopyToAsync(file);
         }
    }
});

替代实现将使用来自TPL Dataflow的ActionBlock。它知道这里需要的是什么,你只需要设置它:

public static Task ForEachAsync<T>(
    this IEnumerable<T> items, Func<T, Task> action, int maxDegreeOfParallelism)
{
    var block = new ActionBlock<T>(
        action,
        new ExecutionDataflowBlockOptions
        {
            MaxDegreeOfParallelism = maxDegreeOfParallelism
        });

    foreach (var item in items)
    {
        block.Post(item);
    }

    block.Complete();
    return block.Completion;
}