为什么在对同一TaskCompletionSource多次调用时,Task.WhenAny如此之慢?

时间:2017-07-21 20:40:24

标签: .net performance async-await task-parallel-library

如果一个类的成员TaskCompletionSource<TResult> m_tcs具有较长的生命周期,并且如果使用m_tcs.Task调用Task.WhenAny作为其参数之一,则当呼叫数超过50,000时,性能似乎会呈指数级下降电话左右。

为什么在这种情况下这么慢?可能有一种替代方案可以更快地运行但不使用4倍以上的内存吗?

我的想法是Task.WhenAny可能会在m_tcs.Task之间添加和删除许多延续,并且在某处会导致O(N²)的复杂性。

我通过将TCS包装在等待m_tcs.Task的异步函数中找到了更高效的替代方法。它使用大约4倍的内存,但运行超过20,000次迭代。

下面的示例代码(为了获得准确的结果,直接编译和运行.exe而不附加调试器)。请注意,WhenAnyMemberTcsDirect存在性能问题,WhenAnyMemberTcsIndirect是更快的替代方案,WhenAnyLocalTcs是比较的基准:

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading.Tasks;

public class WithTcs
{
    // long-lived TaskCompletionSource
    private readonly TaskCompletionSource<bool> m_tcs = new TaskCompletionSource<bool>();

    // this has performance issues for large N - O(N^2)
    public async Task WhenAnyMemberTcsDirectAsync(Task task)
    {
        await await Task.WhenAny(task, m_tcs.Task).ConfigureAwait(false);
    }

    // performs faster - O(N), but uses 4x memory
    public async Task WhenAnyMemberTcsIndirectAsync(Task task)
    {
        await await Task.WhenAny(task, AwaitTcsTaskAsync(m_tcs)).ConfigureAwait(false);
    }

    private async Task<TResult> AwaitTcsTaskAsync<TResult>(TaskCompletionSource<TResult> tcs)
    {
        return await tcs.Task.ConfigureAwait(false);
    }

    // baseline for comparison using short-lived TCS
    public async Task WhenAnyLocalTcsAsync(Task task)
    {
        var tcs = new TaskCompletionSource<bool>();
        await await Task.WhenAny(task, tcs.Task).ConfigureAwait(false);
    }
}

class Program
{
    static void Main(string[] args)
    {
        show_warning_if_debugger_attached();

        MainAsync().GetAwaiter().GetResult();

        show_warning_if_debugger_attached();
        Console.ReadLine();
    }

    static async Task MainAsync()
    {
        const int n = 100000;

        Console.WriteLine("Running Task.WhenAny tests ({0:#,0} iterations)", n);
        Console.WriteLine();

        await WhenAnyLocalTcs(n).ConfigureAwait(false);

        await Task.Delay(1000).ConfigureAwait(false);

        await WhenAnyMemberTcsIndirect(n).ConfigureAwait(false);

        await Task.Delay(1000).ConfigureAwait(false);

        await WhenAnyMemberTcsDirect(n).ConfigureAwait(false);
    }

    static Task WhenAnyLocalTcs(int n)
    {
        Func<WithTcs, Task, Task> function =
            (instance, task) => instance.WhenAnyLocalTcsAsync(task);

        return RunTestAsync(n, function);
    }

    static Task WhenAnyMemberTcsIndirect(int n)
    {
        Func<WithTcs, Task, Task> function =
            (instance, task) => instance.WhenAnyMemberTcsIndirectAsync(task);

        return RunTestAsync(n, function);
    }

    static Task WhenAnyMemberTcsDirect(int n)
    {
        Func<WithTcs, Task, Task> function =
            (instance, task) => instance.WhenAnyMemberTcsDirectAsync(task);

        return RunTestAsync(n, function);
    }

    static async Task RunTestAsync(int n, Func<WithTcs, Task, Task> function, [CallerMemberName] string name = "")
    {
        Console.WriteLine(name);

        var tasks = new Task[n];
        var sw = new Stopwatch();
        var startBytes = GC.GetTotalMemory(true);
        sw.Start();

        var instance = new WithTcs();
        var step = n / 78;
        for (int i = 0; i < n; i++)
        {
            var iTemp = i;
            Task primaryTask = Task.Run(() => { if (iTemp % step == 0) Console.Write("."); });
            tasks[i] = function(instance, primaryTask);
        }

        await Task.WhenAll(tasks).ConfigureAwait(false);
        Console.WriteLine();

        var endBytes = GC.GetTotalMemory(true);
        sw.Stop();
        GC.KeepAlive(instance);
        GC.KeepAlive(tasks);

        Console.WriteLine("  Time: {0,7:#,0} ms, Memory: {1,10:#,0} bytes", sw.ElapsedMilliseconds, endBytes - startBytes);
        Console.WriteLine();
    }

    static void show_warning_if_debugger_attached()
    {
        if (Debugger.IsAttached)
            Console.WriteLine("WARNING: running with the debugger attached may result in inaccurate results\r\n".ToUpper());
    }
}

示例结果:

Iterations | WhenAny* Method   | Time (ms) | Memory (bytes)
---------: | ----------------- | --------: | -------------:
     1,000 | LocalTcs          |        21 |         58,248
     1,000 | MemberTcsIndirect |        54 |        217,268
     1,000 | MemberTcsDirect   |        21 |         52,496
    10,000 | LocalTcs          |        91 |        545,836
    10,000 | MemberTcsIndirect |        98 |      2,141,836
    10,000 | MemberTcsDirect   |       140 |        545,640
   100,000 | LocalTcs          |       210 |      4,898,512
   100,000 | MemberTcsIndirect |       502 |     21,426,316
   100,000 | MemberTcsDirect   |    14,090 |      5,085,396
   200,000 | LocalTcs          |       366 |      9,630,872
   200,000 | MemberTcsIndirect |       659 |     41,450,916
   200,000 | MemberTcsDirect   |    42,599 |     10,069,248
   500,000 | LocalTcs          |       808 |     23,670,492
   500,000 | MemberTcsIndirect |     1,906 |     97,339,192
   500,000 | MemberTcsDirect   |   288,373 |     24,968,436
 1,000,000 | LocalTcs          |     1,642 |     47,272,744
 1,000,000 | MemberTcsIndirect |     3,149 |    200,480,888
 1,000,000 | MemberTcsDirect   | 1,268,030 |     48,064,772

注意:针对.NET 4.6.2版本(任何CPU),在Windows 7 SP1 64位,英特尔酷睿i7-4770上进行测试。

1 个答案:

答案 0 :(得分:1)

我找到了一个看似快速运行(O(N)时间)和大约运行的解决方案。通过CancellationTokenSource m_cts旁边的成员TaskCompletionSource使用相同的内存空间。任何以前设置m_tcs取消/出错/结果的调用都需要伴随m_cts.Cancel()。这当然可以抽象出来。

解决方案:

public class WithTcs
{
    // ... same as above, plus below

    private readonly CancellationTokenSource m_cts = new CancellationTokenSource();

    public async Task WhenAnyMemberCtsAsync(Task task)
    {
        var ct = m_cts.Token;
        var tcs = new TaskCompletionSource<bool>();
        using (ct.Register(() => tcs.TrySetFrom(m_tcs)))
            await await Task.WhenAny(task, tcs.Task).ConfigureAwait(false);
    }
}

public static class TcsExtensions
{
    public static bool TrySetFrom<TResult>(this TaskCompletionSource<TResult> dest, TaskCompletionSource<TResult> source)
    {
        switch (source.Task.Status)
        {
            case TaskStatus.Canceled:
                return dest.TrySetCanceled();
            case TaskStatus.Faulted:
                return dest.TrySetException(source.Task.Exception.InnerExceptions);
            case TaskStatus.RanToCompletion:
                return dest.TrySetResult(source.Task.Result);
            default:
                return false; // TCS has not yet completed
        }
    }
}

这回答了一个问题,即是否存在一种具有内存效率的快速替代方案。我仍然对WhenAnyMemberTcsDirect幕后发生的O(N²)问题感到好奇。