Question

我正在观看The zen of async: Best practices for best performance，Stephen Toub开始谈论任务缓存，而不是缓存任务作业的结果，而是缓存任务本身。据我所知，为每项工作开始一项新任务是昂贵的，应该尽量减少。在28:00左右，他展示了这种方法：

private static ConcurrentDictionary<string, string> s_urlToContents;

public static async Task<string> GetContentsAsync(string url)
{
    string contents;
    if(!s_urlToContents.TryGetValue(url, out contents))
    {
        var response = new HttpClient().GetAsync(url);
        contents = response.EnsureSuccessStatusCode().Content.ReadAsString();
        s_urlToContents.TryAdd(url, contents);
    }
    return contents;
}

首先看起来像是一个很好的思考方法，你可以缓存结果，我没有考虑缓存获取内容的工作。

而且他展示了这种方法：

private static ConcurrentDictionary<string, Task<string>> s_urlToContents;

public static Task<string> GetContentsAsync(string url)
{
    Task<string> contents;
    if(!s_urlToContents.TryGetValue(url, out contents))
    {
        contents = GetContentsAsync(url);
        contents.ContinueWith(t => s_urlToContents.TryAdd(url, t); },
        TaskContinuationOptions.OnlyOnRanToCompletion |
        TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
    }
    return contents;
}

private static async Task<string> GetContentsAsync(string url)
{
    var response = await new HttpClient().GetAsync(url);
    return response.EnsureSuccessStatusCode().Content.ReadAsString();
}

我无法理解这实际上有多帮助而不仅仅是存储结果。

这是否意味着您使用较少的任务来获取数据？

而且，我们如何知道何时缓存任务？据我所知，如果你在错误的地方缓存，你只会增加一些开销并给系统带来太多压力

Answer 1

我无法理解这实际上有多帮助存储结果。

当方法用async修饰符标记时，编译器会自动将基础方法转换为状态机，正如Stephan在之前的幻灯片中演示的那样。这意味着使用第一种方法将始终触发Task的创建。

在第二个示例中，请注意Stephan删除了async修饰符，该方法的签名现在为public static Task<string> GetContentsAsync(string url)。现在这意味着创建Task的责任在于方法的实现者而不是编译器。通过缓存Task<string>，唯一的＆＃34;惩罚＆＃34;创建Task（实际上，两个任务，因为ContinueWith也会创建一个）就是它在缓存中不可用，而不是foreach方法调用。

在这个特定的例子中，IMO没有重新使用第一个任务执行时已经在进行的网络操作，只是为了减少分配的Task个对象的数量。

我们如何知道何时缓存任务？

考虑将Task缓存为其他任何东西，可以从更广泛的角度来看待这个问题：我什么时候应该缓存一些东西？这个问题的答案是广泛，但我认为最常见的用例是当你有一个昂贵的操作，这是在你的应用程序的热路径上。你总是是缓存任务吗？当然不。状态机分配的开销通常是可以忽略的。如果需要，请对您的应用进行概要分析，然后（并且只有这样）考虑缓存是否会在您的特定用例中使用。

Answer 2

假设您正在与远程服务进行通信，该服务采用城市名称并返回其邮政编码。该服务是远程的并且处于负载状态，因此我们正在讨论具有异步签名的方法：

interface IZipCodeService
{
    Task<ICollection<ZipCode>> GetZipCodesAsync(string cityName);
}

由于服务需要一段时间用于每个请求，我们希望为它实现本地缓存。自然地，缓存也将具有异步签名，甚至可能实现相同的接口（参见Facade模式）。同步签名会破坏永远不会与.Wait（）,. Result或类似的同步调用异步代码的最佳做法。至少缓存应该将其留给调用者。

让我们对此进行第一次迭代：

class ZipCodeCache : IZipCodeService
{
    private readonly IZipCodeService realService;
    private readonly ConcurrentDictionary<string, ICollection<ZipCode>> zipCache = new ConcurrentDictionary<string, ICollection<ZipCode>>();

    public ZipCodeCache(IZipCodeService realService)
    {
        this.realService = realService;
    }

    public Task<ICollection<ZipCode>> GetZipCodesAsync(string cityName)
    {
        ICollection<ZipCode> zipCodes;
        if (zipCache.TryGetValue(cityName, out zipCodes))
        {
            // Already in cache. Returning cached value
            return Task.FromResult(zipCodes);
        }
        return this.realService.GetZipCodesAsync(cityName).ContinueWith((task) =>
        {
            this.zipCache.TryAdd(cityName, task.Result);
            return task.Result;
        });
    }
}

正如您所看到的，缓存不会缓存Task对象，而是返回ZipCode集合的返回值。但是通过这样做，它必须通过调用Task.FromResult来为每个缓存命中构建一个Task，我认为这正是Stephen Toub试图避免的。 Task对象带来了特别是垃圾收集器的开销，因为您不仅要创建垃圾，而且每个Task都有一个需要运行时考虑的Finalizer。

解决这个问题的唯一方法是缓存整个Task对象：

class ZipCodeCache2 : IZipCodeService
{
    private readonly IZipCodeService realService;
    private readonly ConcurrentDictionary<string, Task<ICollection<ZipCode>>> zipCache = new ConcurrentDictionary<string, Task<ICollection<ZipCode>>>();

    public ZipCodeCache2(IZipCodeService realService)
    {
        this.realService = realService;
    }

    public Task<ICollection<ZipCode>> GetZipCodesAsync(string cityName)
    {
        Task<ICollection<ZipCode>> zipCodes;
        if (zipCache.TryGetValue(cityName, out zipCodes))
        {
            return zipCodes;
        }
        return this.realService.GetZipCodesAsync(cityName).ContinueWith((task) =>
        {
            this.zipCache.TryAdd(cityName, task);
            return task.Result;
        });
    }
}

正如您所看到的，通过调用Task.FromResult创建任务已经一去不复返了。此外，在使用async / await关键字时无法避免此任务创建，因为无论您的代码缓存了什么内部，它们都会创建一个返回的任务。类似的东西：

    public async Task<ICollection<ZipCode>> GetZipCodesAsync(string cityName)
    {
        Task<ICollection<ZipCode>> zipCodes;
        if (zipCache.TryGetValue(cityName, out zipCodes))
        {
            return zipCodes;
        }

不会编译。

不要对Stephen Toub的ContinueWith标志 TaskContinuationOptions.OnlyOnRanToCompletion 和 TaskContinuationOptions.ExecuteSynchronously 感到困惑。它们（仅）是另一种性能优化，与缓存任务的主要目标无关。

与每个缓存一样，您应该考虑一些不时清理缓存的机制，并删除过旧或无效的条目。您还可以实现一个策略，该策略将缓存限制为n个条目，并通过引入一些计数来缓存最常请求的项目。

我在使用和不使用任务缓存时进行了一些基准测试。你可以在http://pastebin.com/SEr2838A找到代码，结果在我的机器上看起来像这样（w / .NET4.6）

Caching ZipCodes: 00:00:04.6653104
Gen0: 3560 Gen1: 0 Gen2: 0
Caching Tasks: 00:00:03.9452951
Gen0: 1017 Gen1: 0 Gen2: 0

Answer 3

相关的区别在于考虑在填充缓存之前多次调用该方法时会发生什么。

如果您只缓存结果，就像在第一个代码段中那样，那么如果在方法完成之前对方法进行了两次（或三次或五十次）调用，那么他们将全部启动实际生成结果的操作（在这种情况下，执行网络请求）。因此，您现在有两个，三个，五十个或任何您正在进行的网络请求，所有这些请求都会在完成后将结果放入缓存中。

当你缓存任务而不是操作的结果时，如果在其他人开始他们的请求之后，但在其中任何一个之前对此方法进行了第二次，第三次或第五十次调用请求已经完成，他们将被赋予代表一个网络操作（或任何长时间运行的操作）的相同任务。这意味着您只发送一个网络请求，或者只执行一次昂贵的计算，而不是在您对同一结果有多个请求时复制该工作。

另外，考虑一个请求被发送的情况，以及当它完成95％时，对该方法进行第二次调用。在第一个片段中，由于没有结果，它将从头开始并完成100％的工作。第二个片段将导致第二次调用被传递Task 95％完成，因此第二次调用将比使用第一种方法更快地获得其结果，除了整个系统之外，只需要少量工作。

在这两种情况下，如果你没有在没有缓存的情况下调用该方法，而另一种方法已经开始完成工作，那么这两种方法之间没有任何有意义的区别。

您可以创建一个相当简单的可重现示例来演示此行为。这里我们有一个玩具长时间运行的操作，以及缓存结果或缓存它返回的Task的方法。当我们同时触发5个操作时，您会看到结果缓存执行5次长时间运行操作，并且任务缓存只执行一次。

public class AsynchronousCachingSample
{
    private static async Task<string> SomeLongRunningOperation()
    {
        Console.WriteLine("I'm starting a long running operation");
        await Task.Delay(1000);
        return "Result";
    }

    private static ConcurrentDictionary<string, string> resultCache =
        new ConcurrentDictionary<string, string>();
    private static async Task<string> CacheResult(string key)
    {
        string output;
        if (!resultCache.TryGetValue(key, out output))
        {
            output = await SomeLongRunningOperation();
            resultCache.TryAdd(key, output);
        }
        return output;
    }

    private static ConcurrentDictionary<string, Task<string>> taskCache =
        new ConcurrentDictionary<string, Task<string>>();
    private static Task<string> CacheTask(string key)
    {
        Task<string> output;
        if (!taskCache.TryGetValue(key, out output))
        {
            output = SomeLongRunningOperation();
            taskCache.TryAdd(key, output);
        }
        return output;
    }

    public static async Task Test()
    {
        int repetitions = 5;
        Console.WriteLine("Using result caching:");
        await Task.WhenAll(Enumerable.Repeat(false, repetitions)
              .Select(_ => CacheResult("Foo")));

        Console.WriteLine("Using task caching:");
        await Task.WhenAll(Enumerable.Repeat(false, repetitions)
              .Select(_ => CacheTask("Foo")));
    }
}

值得注意的是，您提供的第二种方法的具体实施具有一些值得注意的特性。该方法可以被调用两次，以便在任务完成启动操作之前，它们都将启动长时间运行的操作，因此缓存代表该操作的Task。因此，虽然它比第一个代码段更难，但无论长时间运行的操作要运行两次，都。需要更强大的锁定来检查缓存，启动新操作，然后填充缓存，以防止这种情况。如果在极少数情况下多次执行长时间运行的任务只会浪费一点时间，那么当前代码可能很好，但是如果操作从不那么重要多次执行（例如，因为它会产生副作用），那么当前的代码就不会完成。

何时缓存任务？

3 个答案: