如何限制并发异步I / O操作的数量?

时间:2012-05-29 21:26:54

标签: c# asynchronous task-parallel-library async-ctp async-await

// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };

// now let's send HTTP requests to each of these URLs in parallel
urls.AsParallel().ForAll(async (url) => {
    var client = new HttpClient();
    var html = await client.GetStringAsync(url);
});

问题在于,它会同时启动1000多个Web请求。有没有一种简单的方法来限制这些异步http请求的并发数量?这样在任何给定时间都不会下载超过20个网页。如何以最有效的方式做到这一点?

15 个答案:

答案 0 :(得分:138)

您可以使用.NET 4.5 Beta在最新版本的async for .NET中执行此操作。来自'usr'的上一篇文章指出了由Stephen Toub撰写的一篇好文章,但不太公布的新闻是异步信号量实际上已经进入了.NET 4.5的Beta版本

如果你看一下我们心爱的SemaphoreSlim课程(你应该使用它,因为它比原来的Semaphore更高效),它现在拥有WaitAsync(...)系列的重载,所有期望的参数 - 超时间隔,取消令牌,所有常用的调度朋友:)

Stephen还撰写了一篇更新的博客文章,内容涉及测试版What’s New for Parallelism in .NET 4.5 Beta中出现的新.NET 4.5好东西。

最后,这里有一些关于如何使用SemaphoreSlim进行异步方法限制的示例代码:

public async Task MyOuterMethod()
{
    // let's say there is a list of 1000+ URLs
    var urls = { "http://google.com", "http://yahoo.com", ... };

    // now let's send HTTP requests to each of these URLs in parallel
    var allTasks = new List<Task>();
    var throttler = new SemaphoreSlim(initialCount: 20);
    foreach (var url in urls)
    {
        // do an async wait until we can schedule again
        await throttler.WaitAsync();

        // using Task.Run(...) to run the lambda in its own parallel
        // flow on the threadpool
        allTasks.Add(
            Task.Run(async () =>
            {
                try
                {
                    var client = new HttpClient();
                    var html = await client.GetStringAsync(url);
                }
                finally
                {
                    throttler.Release();
                }
            }));
    }

    // won't get here until all urls have been put into tasks
    await Task.WhenAll(allTasks);

    // won't get here until all tasks have completed in some way
    // (either success or exception)
}

最后,但值得一提的是使用基于TPL的调度的解决方案。您可以在TPL上创建尚未启动的委托绑定任务,并允许自定义任务计划程序限制并发。事实上,这里有一个MSDN示例:

另见TaskScheduler

答案 1 :(得分:8)

不幸的是,.NET Framework缺少用于编排并行异步任务的最重要的组合器。内置没有这样的东西。

看看最值得尊敬的Stephen Toub建造的AsyncSemaphore课程。你想要的是一个信号量,你需要一个异步版本。

答案 2 :(得分:8)

如果您有一个IEnumerable(即URL的字符串),并且您希望同时对这些操作进行I / O绑定操作(即。发出异步http请求),并且您可能还需要设置最大值实时并发I / O请求的数量,以下是您可以执行此操作的方法。这样你不使用线程池等,该方法使用semaphoreslim来控制最大并发I / O请求,类似于一个请求完成的滑动窗口模式,离开信号量并且下一个信号进入。

用法: 等待ForEachAsync(urlStrings,YourAsyncFunc,optionalMaxDegreeOfConcurrency);

public static Task ForEachAsync<TIn>(
        IEnumerable<TIn> inputEnumerable,
        Func<TIn, Task> asyncProcessor,
        int? maxDegreeOfParallelism = null)
    {
        int maxAsyncThreadCount = maxDegreeOfParallelism ?? DefaultMaxDegreeOfParallelism;
        SemaphoreSlim throttler = new SemaphoreSlim(maxAsyncThreadCount, maxAsyncThreadCount);

        IEnumerable<Task> tasks = inputEnumerable.Select(async input =>
        {
            await throttler.WaitAsync().ConfigureAwait(false);
            try
            {
                await asyncProcessor(input).ConfigureAwait(false);
            }
            finally
            {
                throttler.Release();
            }
        });

        return Task.WhenAll(tasks);
    }

答案 3 :(得分:4)

Theo Yaung示例很不错,但是有一个没有等待任务列表的变体。

 class SomeChecker
 {
    private const int ThreadCount=20;
    private CountdownEvent _countdownEvent;
    private SemaphoreSlim _throttler;

    public Task Check(IList<string> urls)
    {
        _countdownEvent = new CountdownEvent(urls.Count);
        _throttler = new SemaphoreSlim(ThreadCount); 

        return Task.Run( // prevent UI thread lock
            async  () =>{
                foreach (var url in urls)
                {
                    // do an async wait until we can schedule again
                    await _throttler.WaitAsync();
                    ProccessUrl(url); // NOT await
                }
                //instead of await Task.WhenAll(allTasks);
                _countdownEvent.Wait();
            });
    }

    private async Task ProccessUrl(string url)
    {
        try
        {
            var page = await new WebClient()
                       .DownloadStringTaskAsync(new Uri(url)); 
            ProccessResult(page);
        }
        finally
        {
            _throttler.Release();
            _countdownEvent.Signal();
        }
    }

    private void ProccessResult(string page){/*....*/}
}

答案 4 :(得分:4)

存在很多陷阱,在错误情况下直接使用信号量可能会很棘手,所以我建议使用AsyncEnumerator NuGet Package而不是重新发明轮子:

// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };

// now let's send HTTP requests to each of these URLs in parallel
await urls.ParallelForEachAsync(async (url) => {
    var client = new HttpClient();
    var html = await client.GetStringAsync(url);
}, maxDegreeOfParallelism: 20);

答案 5 :(得分:2)

SemaphoreSlim在这里非常有帮助。这是我创建的扩展方法。

    /// <summary>
    /// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
    /// </summary>
    /// <typeparam name="T">Type of IEnumerable</typeparam>
    /// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
    /// <param name="action">an async <see cref="Action" /> to execute</param>
    /// <param name="maxActionsToRunInParallel">Optional, max numbers of the actions to run in parallel,
    /// Must be grater than 0</param>
    /// <returns>A Task representing an async operation</returns>
    /// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
    public static async Task ForEachAsyncConcurrent<T>(
        this IEnumerable<T> enumerable,
        Func<T, Task> action,
        int? maxActionsToRunInParallel = null)
    {
        if (maxActionsToRunInParallel.HasValue)
        {
            using (var semaphoreSlim = new SemaphoreSlim(
                maxActionsToRunInParallel.Value, maxActionsToRunInParallel.Value))
            {
                var tasksWithThrottler = new List<Task>();

                foreach (var item in enumerable)
                {
                    // Increment the number of currently running tasks and wait if they are more than limit.
                    await semaphoreSlim.WaitAsync();

                    tasksWithThrottler.Add(Task.Run(async () =>
                    {
                        await action(item).ContinueWith(res =>
                        {
                            // action is completed, so decrement the number of currently running tasks
                            semaphoreSlim.Release();
                        });
                    }));
                }

                // Wait for all of the provided tasks to complete.
                await Task.WhenAll(tasksWithThrottler.ToArray());
            }
        }
        else
        {
            await Task.WhenAll(enumerable.Select(item => action(item)));
        }
    }

样本用法:

await enumerable.ForEachAsyncConcurrent(
    async item =>
    {
        await SomeAsyncMethod(item);
    },
    5);

答案 6 :(得分:2)

https://stackoverflow.com/a/10810730/1186165的简洁版本:

YGS_Dup_Scatter = read.csv(file.choose(), header=TRUE, sep=",")
colNames_scatter_dup <- names(YGS_Dup_Scatter)[4:56]
colNames_scatter_dup2 <- names(YGS_Dup_Scatter)[57:109]
for (j in 1:length(colNames_scatter_dup)) {
  plt <- ggplot(YGS_Dup_Scatter, mapping = aes_string(x=colNames_scatter_dup[j], y =colNames_scatter_dup2[j])) +
    geom_point() + theme_calc() + ggtitle(paste(colNames_scatter_dup[j], "Duplicate Comparison", sep=" - ")) + theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16), axis.text.x = element_text(face = "bold", size = "14"), axis.text.y = element_text(face = "bold", size = "12"), plot.margin = margin(10, 30, 2, 2), axis.title.y=element_text(face = "bold", size = "14"), plot.background = element_rect(fill = "lightskyblue2"))
  print(plt)
  ggsave(paste0(i,".png"))
  Sys.sleep(2)
}

答案 7 :(得分:0)

虽然1000个任务可能很快排队,但Parallel Tasks库只能处理与计算机中CPU核心数量相等的并发任务。这意味着如果你有一台四核机器,在给定时间只会执行4个任务(除非你降低MaxDegreeOfParallelism)。

答案 8 :(得分:0)

应该使用并行计算来加速CPU绑定操作。这里我们讨论I / O绑定操作。您的实现应该是purely async,除非您压倒了多核CPU上繁忙的单核。

编辑我喜欢usr在这里使用“异步信号量”的建议。

答案 9 :(得分:0)

使用MaxDegreeOfParallelism,这是您可以在Parallel.ForEach()中指定的选项:

var options = new ParallelOptions { MaxDegreeOfParallelism = 20 };

Parallel.ForEach(urls, options,
    url =>
        {
            var client = new HttpClient();
            var html = client.GetStringAsync(url);
            // do stuff with html
        });

答案 10 :(得分:0)

老问题,新答案。 @vitidev有一个代码块,在我审查的项目中几乎完整地重用了。在与几位同事讨论之后,有人问道,为什么不使用内置的TPL方法?&#34; ActionBlock看起来像那里的胜利者。 https://msdn.microsoft.com/en-us/library/hh194773(v=vs.110).aspx。可能最终不会改变任何现有的代码,但肯定会采用这种方法并重复使用Softy先生的最佳实践进行限制并行。

答案 11 :(得分:0)

这是一种利用LINQ的惰性特性的解决方案。它具有以下优点:不产生线程(像accepted answer一样),并且没有一次创建所有任务,并且几乎所有任务都像SemaphoreSlim解决方案一样被阻塞在SemaphoreSlim上。首先,让它正常工作而不受限制。第一步是将我们的网址转换为可枚举的任务。

string[] urls =
{
    "https://stackoverflow.com",
    "https://superuser.com",
    "https://serverfault.com",
    "https://meta.stackexchange.com",
    // ...
};
var tasks = urls.Select(async (url) =>
{
    using (var client = new HttpClient())
    {
        return (Url: url, Html: await client.GetStringAsync(url));
    }
});

第二步是使用Task.WhenAll方法同时await所有任务:

var results = await Task.WhenAll(tasks);
foreach (var result in results)
{
    Console.WriteLine($"Url: {result.Url}, {result.Html.Length:#,0} chars");
}

输出:

  

网址:https://stackoverflow.com,105.574个字符
  网址:https://superuser.com,126.953个字符
  网址:https://serverfault.com,125.963个字符
  网址:https://meta.stackexchange.com,185.276个字符
  ...

Task.WhenAll中的

Microsoft's implementation立即将提供的可枚举枚举化为数组,从而导致所有任务立即开始。我们不想要那样,因为我们想限制并发异步操作的数量。因此,我们需要实现一个替代WhenAll,它会逐步枚举我们的枚举对象。我们将通过创建多个工作程序任务(等于所需的并行度)来做到这一点,每个工作程序任务将一次枚举我们可枚举的一个任务,并使用锁确保将处理每个url任务仅执行一项工作任务。然后我们await完成所有工作任务,最后在恢复它们的顺序后返回结果。这是实现:

public static async Task<T[]> WhenAll<T>(IEnumerable<Task<T>> tasks,
    int degreeOfParallelism)
{
    if (tasks is ICollection<Task<T>>) throw new ArgumentException(
        "The enumerable should not be materialized.", nameof(tasks));
    var results = new List<(int Index, T Result)>();
    var failed = false;
    using (var enumerator = tasks.GetEnumerator())
    {
        int index = 0;
        var workerTasks = Enumerable.Range(0, degreeOfParallelism)
        .Select(async _ =>
        {
            try
            {
                while (true)
                {
                    Task<T> task;
                    int localIndex;
                    lock (enumerator)
                    {
                        if (failed || !enumerator.MoveNext()) break;
                        task = enumerator.Current;
                        localIndex = index++;
                    }
                    var result = await task.ConfigureAwait(false);
                    lock (results) results.Add((localIndex, result));
                }
            }
            catch
            {
                lock (enumerator) failed = true;
                throw;
            }
        }).ToArray();
        await Task.WhenAll(workerTasks).ConfigureAwait(false);
    }
    return results.OrderBy(e => e.Index).Select(e => e.Result).ToArray();
}

...这是我们必须在初始代码中进行的更改,以实现所需的限制:

var results = await WhenAll(tasks, degreeOfParallelism: 2);

在处理异常方面有所不同。本机Task.WhenAll等待所有任务完成并聚集所有异常。完成第一个错误任务后,上面的实现立即停止等待。

答案 12 :(得分:-1)

基本上,您将要为要触发的每个URL创建一个Action或Task,将它们放入List中,然后处理该列表,限制可以并行处理的数量。

My blog post显示了如何使用“任务”和“操作”执行此操作,并提供了一个示例项目,您可以下载并运行该项目以查看两者的实际操作。

使用操作

如果使用Actions,则可以使用内置的.Net Parallel.Invoke函数。在这里,我们将其限制为最多并行运行20个线程。

var listOfActions = new List<Action>();
foreach (var url in urls)
{
    var localUrl = url;
    // Note that we create the Task here, but do not start it.
    listOfTasks.Add(new Task(() => CallUrl(localUrl)));
}

var options = new ParallelOptions {MaxDegreeOfParallelism = 20};
Parallel.Invoke(options, listOfActions.ToArray());

使用任务

使用任务时,没有内置功能。但是,您可以使用我在博客上提供的那个。

    /// <summary>
    /// Starts the given tasks and waits for them to complete. This will run, at most, the specified number of tasks in parallel.
    /// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
    /// </summary>
    /// <param name="tasksToRun">The tasks to run.</param>
    /// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
    /// <param name="cancellationToken">The cancellation token.</param>
    public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, CancellationToken cancellationToken = new CancellationToken())
    {
        await StartAndWaitAllThrottledAsync(tasksToRun, maxTasksToRunInParallel, -1, cancellationToken);
    }

    /// <summary>
    /// Starts the given tasks and waits for them to complete. This will run the specified number of tasks in parallel.
    /// <para>NOTE: If a timeout is reached before the Task completes, another Task may be started, potentially running more than the specified maximum allowed.</para>
    /// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
    /// </summary>
    /// <param name="tasksToRun">The tasks to run.</param>
    /// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
    /// <param name="timeoutInMilliseconds">The maximum milliseconds we should allow the max tasks to run in parallel before allowing another task to start. Specify -1 to wait indefinitely.</param>
    /// <param name="cancellationToken">The cancellation token.</param>
    public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, int timeoutInMilliseconds, CancellationToken cancellationToken = new CancellationToken())
    {
        // Convert to a list of tasks so that we don't enumerate over it multiple times needlessly.
        var tasks = tasksToRun.ToList();

        using (var throttler = new SemaphoreSlim(maxTasksToRunInParallel))
        {
            var postTaskTasks = new List<Task>();

            // Have each task notify the throttler when it completes so that it decrements the number of tasks currently running.
            tasks.ForEach(t => postTaskTasks.Add(t.ContinueWith(tsk => throttler.Release())));

            // Start running each task.
            foreach (var task in tasks)
            {
                // Increment the number of tasks currently running and wait if too many are running.
                await throttler.WaitAsync(timeoutInMilliseconds, cancellationToken);

                cancellationToken.ThrowIfCancellationRequested();
                task.Start();
            }

            // Wait for all of the provided tasks to complete.
            // We wait on the list of "post" tasks instead of the original tasks, otherwise there is a potential race condition where the throttler's using block is exited before some Tasks have had their "post" action completed, which references the throttler, resulting in an exception due to accessing a disposed object.
            await Task.WhenAll(postTaskTasks.ToArray());
        }
    }

然后创建任务列表并调用函数让它们运行,一次最多同时执行20个,你可以这样做:

var listOfTasks = new List<Task>();
foreach (var url in urls)
{
    var localUrl = url;
    // Note that we create the Task here, but do not start it.
    listOfTasks.Add(new Task(async () => await CallUrl(localUrl)));
}
await Tasks.StartAndWaitAllThrottledAsync(listOfTasks, 20);

答案 13 :(得分:-1)

这不是异步的通用解决方案,但是对于HttpClient,您可以尝试

System.Net.ServicePointManager.DefaultConnectionLimit = 20;

答案 14 :(得分:-1)

这是我的第二个答案,其中可能是Theo Yaung的solution(已接受的答案)的改进版本。这也是基于SemaphoreSlim的,并且对URL进行了惰性枚举,但并不依赖Task.WhenAll来等待任务完成。 SemaphoreSlim也用于此目的。这可能是一个优点,因为这意味着在整个操作过程中无需引用已完成的任务。相反,每个任务在完成后都可以立即进行垃圾收集。

提供了ForEachAsync扩展方法的两个重载(名称是从Dogu Arslan的answer借来的,第二个最受欢迎的答案)。一种用于返回结果的任务,另一种用于不返回结果的任务。 onErrorContinue参数是一个不错的附加功能,它控制异常情况下的行为。默认值为false,它模仿Parallel.ForEach的行为(在异常发生后不久就停止处理),而不模仿Task.WhenAll的行为(等待所有任务完成)。 / p>

public static async Task<TResult[]> ForEachAsync<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, Task<TResult>> taskFactory,
    int concurrencyLevel = 1,
    bool onErrorContinue = false)
{
    // Arguments validation omitted
    var throttler = new SemaphoreSlim(concurrencyLevel);
    var results = new List<TResult>();
    var exceptions = new ConcurrentQueue<Exception>();
    int index = 0;
    foreach (var item in source)
    {
        var localIndex = index++;
        lock (results) results.Add(default); // Reserve space in the list
        await throttler.WaitAsync(); // continue on captured context
        if (!onErrorContinue && !exceptions.IsEmpty) { throttler.Release(); break; }

        Task<TResult> task;
        try { task = taskFactory(item); } // or Task.Run(() => taskFactory(item))
        catch (Exception ex)
        {
            exceptions.Enqueue(ex); throttler.Release();
            if (onErrorContinue) continue; else break;
        }

        _ = task.ContinueWith(t =>
        {
            try { lock (results) results[localIndex] = t.GetAwaiter().GetResult(); }
            catch (Exception ex) { exceptions.Enqueue(ex); }
            finally { throttler.Release(); }
        }, default, TaskContinuationOptions.ExecuteSynchronously,
            TaskScheduler.Default);
    }

    // Wait for the last operations to complete
    for (int i = 0; i < concurrencyLevel; i++)
    {
        await throttler.WaitAsync().ConfigureAwait(false);
    }
    if (!exceptions.IsEmpty) throw new AggregateException(exceptions);
    lock (results) return results.ToArray();
}

public static Task ForEachAsync<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, Task> taskFactory,
    int concurrencyLevel = 1,
    bool onErrorContinue = false)
{
    // Arguments validation omitted
    return ForEachAsync<TSource, object>(source, async item =>
    {
        await taskFactory(item).ConfigureAwait(false); return null;
    }, concurrencyLevel, onErrorContinue);
}

taskFactory在调用方的上下文中被调用。这可能是理想的,因为它允许(例如)在lambda内部访问UI元素。如果最好在ThreadPool上下文中调用它,则可以将taskFactory(item)替换为Task.Run(() => taskFactory(item))

为简单起见,Task ForEachAsync的实现不是通过调用通用Task<TResult[]>重载来实现的。

用法示例:

await urls.ForEachAsync(async url =>
{
    var html = await httpClient.GetStringAsync(url);
    TextBox1.AppendText($"Url: {url}, {html.Length:#,0} chars\r\n");
}, concurrencyLevel: 10, onErrorContinue: true);