我需要从windows azure存储下载一堆文件(大约10k)。为了让它们并行下载而不是一次下载我正在使用blob DownloadToStreamAsync方法返回一个Task对象。然后,我使用一种将流保存到文件的方法设置任务ContinueWith。
以下是代码:
foreach (var File in ServerFiles)
{
string sFileName = File.Uri.LocalPath.ToString();
CloudBlockBlob oBlob = BiActionscontainer.GetBlockBlobReference(sFileName.Replace("/" + Container + "/", ""));
MemoryStream ms = new MemoryStream();
BlobRequestOptions f = new BlobRequestOptions();
Task downloadTask = oBlob.DownloadToStreamAsync(ms);
downloadTask.ContinueWith((Task task) =>
{
ms.Position = 0;
lock(lockObject)
{
using (FileStream file = new FileStream(ResultPath, FileMode.Append, FileAccess.Write))
{
byte[] bytes = ms.ToArray();
file.Write(bytes, 0, bytes.Length);
}
}
ms.Dispose();
});
}
此代码是在我们的某个服务器(不是azure)上运行的工具的一部分 - Windows 2003服务器。问题是在那台服务器上我得到“操作已经超时.Microsoft.WindowsAzure.Storage on windows 2003 standard”,所以我想可能是很多文件同时发出请求并阻塞带宽
所以我想知道,在我从第三方库中获取Task对象的情况下,如何限制一次运行的并行数量?还在排队剩下的任务吗?
答案 0 :(得分:2)
您可以使用SemaphoreSlim
。使用您想要拥有的并发请求数设置它,然后在每个请求完成之前使用await WaitAsync()
,在每个请求完成后Release()
,最后等待剩余的任务。
封装在辅助方法中,它看起来像这样:
public static async Task ForEachAsync<T>(
this IEnumerable<T> items, Func<T, Task> action, int maxDegreeOfParallelism)
{
var semaphore = new SemaphoreSlim(maxDegreeOfParallelism);
var tasks = new List<Task>();
foreach (var item in items)
{
await semaphore.WaitAsync();
Func<T, Task> loopAction = async x =>
{
await action(x);
semaphore.Release();
};
tasks.Add(loopAction(item));
}
await Task.WhenAll(tasks);
}
用法(对代码进行一些更改,主要是为了简化代码并使其更加异步):
ServerFiles.ForEachAsync(async file =>
{
string sFileName = File.Uri.LocalPath.ToString();
CloudBlockBlob oBlob = BiActionscontainer.GetBlockBlobReference(sFileName.Replace("/" + Container + "/", ""));
var ms = new MemoryStream();
BlobRequestOptions f = new BlobRequestOptions();
await oBlob.DownloadToStreamAsync(ms);
ms.Position = 0;
lock (lockObject)
{
using (var file = new FileStream(ResultPath, FileMode.Append, FileAccess.Write))
{
await ms.CopyToAsync(file);
}
}
});
替代实现将使用来自TPL Dataflow的ActionBlock
。它知道这里需要的是什么,你只需要设置它:
public static Task ForEachAsync<T>(
this IEnumerable<T> items, Func<T, Task> action, int maxDegreeOfParallelism)
{
var block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism
});
foreach (var item in items)
{
block.Post(item);
}
block.Complete();
return block.Completion;
}