在Parallel.ForEach中嵌套等待

时间:2012-07-19 15:47:17

标签: c# wcf async-await task-parallel-library parallel.foreach

在metro应用程序中,我需要执行许多WCF调用。有大量的调用,所以我需要在并行循环中进行调用。问题是并行循环在WCF调用完成之前退出。

你如何重构这个按预期工作?

var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customers = new  System.Collections.Concurrent.BlockingCollection<Customer>();

Parallel.ForEach(ids, async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
});

foreach ( var customer in customers )
{
    Console.WriteLine(customer.ID);
}

Console.ReadKey();

11 个答案:

答案 0 :(得分:140)

Parallel.ForEach()背后的整个想法是你有一组线程,每个线程处理集合的一部分。正如您所注意到的,这不适用于async - await,您希望在异步调用期间释放该线程。

您可以通过阻止ForEach()个帖子来“修复”这一点,但这会使async - await的整个点失效。

您可以使用TPL Dataflow代替Parallel.ForEach(),它支持异步Task

具体来说,您的代码可以使用TransformBlock编写,Customer使用async lambda将每个ID转换为ActionBlock。该块可以配置为并行执行。您可以将该块链接到将Customer写入控制台的Post()。 设置阻止网络后,您可以TransformBlock将每个ID添加到var ids = new List<string> { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" }; var getCustomerBlock = new TransformBlock<string, Customer>( async i => { ICustomerRepo repo = new CustomerRepo(); return await repo.GetCustomer(i); }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded }); var writeCustomerBlock = new ActionBlock<Customer>(c => Console.WriteLine(c.ID)); getCustomerBlock.LinkTo( writeCustomerBlock, new DataflowLinkOptions { PropagateCompletion = true }); foreach (var id in ids) getCustomerBlock.Post(id); getCustomerBlock.Complete(); writeCustomerBlock.Completion.Wait();

在代码中:

TransformBlock

虽然您可能希望将TransformBlock的并行性限制为某个小常量。此外,您可以限制SendAsync()的容量,并使用{{1}}异步添加项目,例如,如果集合太大。

与您的代码(如果有效)相比,一个额外的好处是,只要单个项目完成,写入就会立即开始,而不是等到所有处理完成。

答案 1 :(得分:104)

svick's answer(像往常一样)很棒。

但是,当您实际需要传输大量数据时,我发现Dataflow更有用。或者当您需要async兼容队列时。

在您的情况下,更简单的解决方案是使用async式并行:

var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };

var customerTasks = ids.Select(i =>
  {
    ICustomerRepo repo = new CustomerRepo();
    return repo.GetCustomer(i);
  });
var customers = await Task.WhenAll(customerTasks);

foreach (var customer in customers)
{
  Console.WriteLine(customer.ID);
}

Console.ReadKey();

答案 2 :(得分:67)

使用DataFlow作为svick建议可能有点过分,而Stephen的回答并没有提供控制操作并发性的方法。但是,这可以简单地实现:

public static async Task RunWithMaxDegreeOfConcurrency<T>(
     int maxDegreeOfConcurrency, IEnumerable<T> collection, Func<T, Task> taskFactory)
{
    var activeTasks = new List<Task>(maxDegreeOfConcurrency);
    foreach (var task in collection.Select(taskFactory))
    {
        activeTasks.Add(task);
        if (activeTasks.Count == maxDegreeOfConcurrency)
        {
            await Task.WhenAny(activeTasks.ToArray());
            //observe exceptions here
            activeTasks.RemoveAll(t => t.IsCompleted); 
        }
    }
    await Task.WhenAll(activeTasks.ToArray()).ContinueWith(t => 
    {
        //observe exceptions in a manner consistent with the above   
    });
}

ToArray()调用可以通过使用数组而不是列表来优化并替换已完成的任务,但我怀疑它在大多数情况下会产生很大的不同。根据OP的问题使用样本:

RunWithMaxDegreeOfConcurrency(10, ids, async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
});

编辑 SO用户和TPL wiz Eli Arbel向我指了related article from Stephen Toub。像往常一样,他的实施既优雅又高效:

public static Task ForEachAsync<T>(
      this IEnumerable<T> source, int dop, Func<T, Task> body) 
{ 
    return Task.WhenAll( 
        from partition in Partitioner.Create(source).GetPartitions(dop) 
        select Task.Run(async delegate { 
            using (partition) 
                while (partition.MoveNext()) 
                    await body(partition.Current).ContinueWith(t => 
                          {
                              //observe exceptions
                          });

        })); 
}

答案 3 :(得分:28)

您可以使用新的AsyncEnumerator NuGet Package来节省工作量,这是4年前问题最初发布时不存在的问题。它允许您控制并行度:

using System.Collections.Async;
...

await ids.ParallelForEachAsync(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
},
maxDegreeOfParallelism: 10);

免责声明:我是AsyncEnumerator库的作者,该库是开放源码并在MIT下获得许可,我发布此消息只是为了帮助社区。

答案 4 :(得分:12)

Parallel.Foreach换成Task.Run()而不是await关键字使用[yourasyncmethod].Result

(你需要执行Task.Run事情来阻止UI线程)

这样的事情:

var yourForeachTask = Task.Run(() =>
        {
            Parallel.ForEach(ids, i =>
            {
                ICustomerRepo repo = new CustomerRepo();
                var cust = repo.GetCustomer(i).Result;
                customers.Add(cust);
            });
        });
await yourForeachTask;

答案 5 :(得分:7)

这应该非常有效,并且比让整个TPL数据流工作更容易:

var customers = await ids.SelectAsync(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    return await repo.GetCustomer(i);
});

...

public static async Task<IList<TResult>> SelectAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector, int maxDegreesOfParallelism = 4)
{
    var results = new List<TResult>();

    var activeTasks = new HashSet<Task<TResult>>();
    foreach (var item in source)
    {
        activeTasks.Add(selector(item));
        if (activeTasks.Count >= maxDegreesOfParallelism)
        {
            var completed = await Task.WhenAny(activeTasks);
            activeTasks.Remove(completed);
            results.Add(completed.Result);
        }
    }

    results.AddRange(await Task.WhenAll(activeTasks));
    return results;
}

答案 6 :(得分:4)

我派对的时间有点晚,但您可能需要考虑使用GetAwaiter.GetResult()在同步上下文中运行异步代码,但如下所示:

 Parallel.ForEach(ids, i =>
{
    ICustomerRepo repo = new CustomerRepo();
    // Run this in thread which Parallel library occupied.
    var cust = repo.GetCustomer(i).GetAwaiter().GetResult();
    customers.Add(cust);
});

答案 7 :(得分:3)

在介绍了一堆辅助方法之后,您将能够使用以下简单语法运行并行查询:

const int DegreeOfParallelism = 10;
IEnumerable<double> result = await Enumerable.Range(0, 1000000)
    .Split(DegreeOfParallelism)
    .SelectManyAsync(async i => await CalculateAsync(i).ConfigureAwait(false))
    .ConfigureAwait(false);

这里发生的是:我们将源集合拆分为10个块(.Split(DegreeOfParallelism)),然后运行10个任务,逐个处理其项目(.SelectManyAsync(...))并将它们合并回单个列表。

值得一提的是有一种更简单的方法:

double[] result2 = await Enumerable.Range(0, 1000000)
    .Select(async i => await CalculateAsync(i).ConfigureAwait(false))
    .WhenAll()
    .ConfigureAwait(false);

但它需要预防措施:如果您的源集合太大,它会立即为每个项目安排Task,这可能会导致显着的性能提升。< / p>

以上示例中使用的扩展方法如下所示:

public static class CollectionExtensions
{
    /// <summary>
    /// Splits collection into number of collections of nearly equal size.
    /// </summary>
    public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> src, int slicesCount)
    {
        if (slicesCount <= 0) throw new ArgumentOutOfRangeException(nameof(slicesCount));

        List<T> source = src.ToList();
        var sourceIndex = 0;
        for (var targetIndex = 0; targetIndex < slicesCount; targetIndex++)
        {
            var list = new List<T>();
            int itemsLeft = source.Count - targetIndex;
            while (slicesCount * list.Count < itemsLeft)
            {
                list.Add(source[sourceIndex++]);
            }

            yield return list;
        }
    }

    /// <summary>
    /// Takes collection of collections, projects those in parallel and merges results.
    /// </summary>
    public static async Task<IEnumerable<TResult>> SelectManyAsync<T, TResult>(
        this IEnumerable<IEnumerable<T>> source,
        Func<T, Task<TResult>> func)
    {
        List<TResult>[] slices = await source
            .Select(async slice => await slice.SelectListAsync(func).ConfigureAwait(false))
            .WhenAll()
            .ConfigureAwait(false);
        return slices.SelectMany(s => s);
    }

    /// <summary>Runs selector and awaits results.</summary>
    public static async Task<List<TResult>> SelectListAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector)
    {
        List<TResult> result = new List<TResult>();
        foreach (TSource source1 in source)
        {
            TResult result1 = await selector(source1).ConfigureAwait(false);
            result.Add(result1);
        }
        return result;
    }

    /// <summary>Wraps tasks with Task.WhenAll.</summary>
    public static Task<TResult[]> WhenAll<TResult>(this IEnumerable<Task<TResult>> source)
    {
        return Task.WhenAll<TResult>(source);
    }
}

答案 8 :(得分:2)

一种扩展方法,它使用SemaphoreSlim并允许设置最大并行度

    /// <summary>
    /// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
    /// </summary>
    /// <typeparam name="T">Type of IEnumerable</typeparam>
    /// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
    /// <param name="action">an async <see cref="Action" /> to execute</param>
    /// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
    /// Must be grater than 0</param>
    /// <returns>A Task representing an async operation</returns>
    /// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
    public static async Task ForEachAsyncConcurrent<T>(
        this IEnumerable<T> enumerable,
        Func<T, Task> action,
        int? maxDegreeOfParallelism = null)
    {
        if (maxDegreeOfParallelism.HasValue)
        {
            using (var semaphoreSlim = new SemaphoreSlim(
                maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
            {
                var tasksWithThrottler = new List<Task>();

                foreach (var item in enumerable)
                {
                    // Increment the number of currently running tasks and wait if they are more than limit.
                    await semaphoreSlim.WaitAsync();

                    tasksWithThrottler.Add(Task.Run(async () =>
                    {
                        await action(item).ContinueWith(res =>
                        {
                            // action is completed, so decrement the number of currently running tasks
                            semaphoreSlim.Release();
                        });
                    }));
                }

                // Wait for all tasks to complete.
                await Task.WhenAll(tasksWithThrottler.ToArray());
            }
        }
        else
        {
            await Task.WhenAll(enumerable.Select(item => action(item)));
        }
    }

样本用法:

await enumerable.ForEachAsyncConcurrent(
    async item =>
    {
        await SomeAsyncMethod(item);
    },
    5);

答案 9 :(得分:0)

这是基于ActionBlock库中的TPL DataflowForEachAsync方法的简单通用实现,该库现已嵌入.NET 5平台:

public static Task ForEachAsync<T>(this IEnumerable<T> source,
    Func<T, Task> action, int dop)
{
    // Arguments validation omitted
    var block = new ActionBlock<T>(action,
        new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = dop });
    foreach (var item in source) block.Post(item);
    block.Complete();
    return block.Completion;
}

此解决方案热切地枚举提供的IEnumerable,并立即将其所有元素发送到ActionBlock。因此,它不适用于具有大量元素的枚举。下面是一种更复杂的方法,该方法惰性地枚举源,并将其元素一一发送到ActionBlock

public static async Task ForEachAsync<T>(this IEnumerable<T> source,
    Func<T, Task> action, int dop)
{
    // Arguments validation omitted
    var block = new ActionBlock<T>(action, new ExecutionDataflowBlockOptions()
    { MaxDegreeOfParallelism = dop, BoundedCapacity = dop });
    foreach (var item in source)
    {
        if (!await block.SendAsync(item).ConfigureAwait(false)) break;
    }
    block.Complete();
    try { await block.Completion.ConfigureAwait(false); }
    catch
    {
        if (block.Completion.IsFaulted) throw block.Completion.Exception;
        throw;
    }
}

在异常情况下,这两种方法的行为不同。第一个¹直接在其AggregateException属性中传播包含异常的InnerExceptions。第二个传播一个AggregateException,其中包含另一个AggregateException,但有例外。我个人觉得第二种方法的行为在实践中更方便,因为等待它会自动消除一定程度的嵌套,因此我可以简单地catch (AggregateException aex)并在aex.InnerExceptions块中处理catch 。第一种方法需要在等待之前存储Task,以便我可以访问task.Exception.InnerExceptions块内的catch。有关从异步方法传播异常的更多信息,请查看here

¹第一个实现elides async and await

答案 10 :(得分:-1)

无需 TPL 的简单原生方式:

int totalThreads = 0; int maxThreads = 3;

foreach (var item in YouList)
{
    while (totalThreads >= maxThreads) await Task.Delay(500);
    Interlocked.Increment(ref totalThreads);

    MyAsyncTask(item).ContinueWith((res) => Interlocked.Decrement(ref totalThreads));
}

您可以在下一个任务中检查此解决方案:

async static Task MyAsyncTask(string item)
{
    await Task.Delay(2500);
    Console.WriteLine(item);
}