我可以给编译器/ JIT提供哪些优化提示?

时间:2013-04-30 18:02:17

标签: c# .net vb.net optimization

我已经介绍了,现在我正在寻求从我的热点中挤出所有可能的性能。

我知道[MethodImplOptions.AggressiveInlining]ProfileOptimization class。还有其他人吗?


[编辑] 我刚刚发现了[TargetedPatchingOptOut]没关系,显然是that one is not needed

2 个答案:

答案 0 :(得分:29)

您已经用尽了.NET 4.5中添加的选项以直接影响jitted代码。下一步是查看生成的机器代码,以发现任何明显的低效率。使用调试器执行此操作,首先阻止它禁用优化程序。工具+选项,调试,常规,取消选中“抑制模块加载时的JIT优化”选项。在热代码上设置一个断点,Debug + Disassembly来查看它。

没有那么多要考虑,抖动优化器通常做得很好。要查找的一件事是尝试消除数组边界检查失败, fixed 关键字是一种不安全的解决方法。一个极端情况是内联方法的失败尝试和不能有效使用cpu寄存器的抖动,这是x86抖动的一个问题,并用MethodImplOptions.NoInlining修复。优化器在将不变代码从循环中提升时并不是非常有效,但在寻找优化它的方法时,这几乎总是首先要考虑的是C#代码。

最重要的事情是,当你完成并且无法让它变得更快时。您只能通过比较苹果和橙子以及使用C ++ / CLI在本机代码中编写热代码来实现目标。确保此代码是使用#pragma unmanaged生成的,因此它获得了完整的优化器爱。从托管代码切换到本机代码执行会产生相关成本,因此请确保本机代码的执行时间足够长。否则这不一定容易做到,你肯定无法保证成功。虽然知道你已经完成了,但可以节省你很多时间绊倒死胡同。

答案 1 :(得分:29)

是的还有更多技巧: - )

我实际上对优化C#代码做了大量研究。到目前为止,这些是最重要的结果:

  1. 直接传递的Func和动作通常由JIT< ter>进行内联。请注意,您不应将它们存储为变量,因为它们将被称为委托。有关详细信息,另请参阅this post
  2. 小心过载。在不使用IEquatable<T>的情况下调用Equals通常是一个糟糕的计划 - 所以如果你使用f.ex.哈希,一定要实现正确的重载和接口,因为它可以保证你的性能。
  3. 从其他类调用的泛型是从不内联。这样做的原因是&#34;魔术&#34;概述here
  4. 如果你使用的是数据结构,请确保尝试使用数组:-)真的,这些东西很快就像......好吧,几乎我认为的任何东西。我通过使用自己的哈希表并使用数组而不是列表来优化了很多东西。
  5. 在很多情况下,表查找比计算事物或使用vtable查找,开关,多个if语句甚至计算等构造更快。如果你有分支机构,这也是一个好方法;失败的分支预测往往会成为一个巨大的痛苦。另请参阅this post - 这是我在C#中使用很多的技巧,它在很多情况下都很有效。哦,查找表当然是数组。
  6. 尝试制作(小)类结构。由于值类型的性质,结构的一些优化不同于类的优化。例如,方法调用更简单,因为编译器确切地知道要调用的方法。结构数组通常也比类数组快,因为每个数组操作需要少一次内存操作。
  7. 不要使用多维数组。虽然我更喜欢Foo[],但即使Foo[][]通常也比Foo[,]更快。
  8. 如果您要复制数据,则在一周中的任何一天更喜欢Buffer.BlockCopy而不是Array.Copy。对字符串也要小心:字符串操作可能会影响性能。
  9. 还有一个名为&#34;英特尔奔腾处理器优化的指南&#34;有大量的技巧(如移位或乘法而不是分割)。虽然编译器现在做了很多努力,但这有时也会有所帮助。

    当然这些只是优化;最大的性能提升通常是改变算法和/或数据结构的结果。请务必查看哪些选项可供您使用,并且不要通过.NET框架限制自己太多...在我检查反编译之前,我自然倾向于不信任.NET实现。我自己编写代码......有很多东西可以更快地实现(大部分时间都是有充分理由的。)

    HTH


    亚历克斯向我指出,根据一些人的说法,Array.Copy实际上更快。由于我真的不知道这些年来发生了什么变化,因此我认为唯一正确的做法是创造一个全新的基准并进行测试。

    如果您只是对结果感兴趣,请前往。在大多数情况下,对Buffer.BlockCopy的调用明显优于Array.Copy。在.NET 4.5.2上使用16 GB内存(大约10 GB免费)的Intel Skylake进行测试。

    代码:

    static void TestNonOverlapped1(int K)
    {
        long total = 1000000000;
        long iter = total / K;
        byte[] tmp = new byte[K];
        byte[] tmp2 = new byte[K];
        for (long i = 0; i < iter; ++i)
        {
            Array.Copy(tmp, tmp2, K);
        }
    }
    
    static void TestNonOverlapped2(int K)
    {
        long total = 1000000000;
        long iter = total / K;
        byte[] tmp = new byte[K];
        byte[] tmp2 = new byte[K];
        for (long i = 0; i < iter; ++i)
        {
            Buffer.BlockCopy(tmp, 0, tmp2, 0, K);
        }
    }
    
    static void TestOverlapped1(int K)
    {
        long total = 1000000000;
        long iter = total / K;
        byte[] tmp = new byte[K + 16];
        for (long i = 0; i < iter; ++i)
        {
            Array.Copy(tmp, 0, tmp, 16, K);
        }
    }
    
    static void TestOverlapped2(int K)
    {
        long total = 1000000000;
        long iter = total / K;
        byte[] tmp = new byte[K + 16];
        for (long i = 0; i < iter; ++i)
        {
            Buffer.BlockCopy(tmp, 0, tmp, 16, K);
        }
    }
    
    static void Main(string[] args)
    {
        for (int i = 0; i < 10; ++i)
        {
            int N = 16 << i;
    
            Console.WriteLine("Block size: {0} bytes", N);
    
            Stopwatch sw = Stopwatch.StartNew();
    
            {
                sw.Restart();
                TestNonOverlapped1(N);
    
                Console.WriteLine("Non-overlapped Array.Copy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);
                GC.Collect(GC.MaxGeneration);
                GC.WaitForFullGCComplete();
            }
    
            {
                sw.Restart();
                TestNonOverlapped2(N);
    
                Console.WriteLine("Non-overlapped Buffer.BlockCopy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);
                GC.Collect(GC.MaxGeneration);
                GC.WaitForFullGCComplete();
            }
    
            {
                sw.Restart();
                TestOverlapped1(N);
    
                Console.WriteLine("Overlapped Array.Copy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);
                GC.Collect(GC.MaxGeneration);
                GC.WaitForFullGCComplete();
            }
    
            {
                sw.Restart();
                TestOverlapped2(N);
    
                Console.WriteLine("Overlapped Buffer.BlockCopy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);
                GC.Collect(GC.MaxGeneration);
                GC.WaitForFullGCComplete();
            }
    
            Console.WriteLine("-------------------------");
        }
    
        Console.ReadLine();
    }
    

    x86 JIT上的结果:

    Block size: 16 bytes
    Non-overlapped Array.Copy: 4267.52 ms
    Non-overlapped Buffer.BlockCopy: 2887.05 ms
    Overlapped Array.Copy: 3305.01 ms
    Overlapped Buffer.BlockCopy: 2670.18 ms
    -------------------------
    Block size: 32 bytes
    Non-overlapped Array.Copy: 1327.55 ms
    Non-overlapped Buffer.BlockCopy: 763.89 ms
    Overlapped Array.Copy: 2334.91 ms
    Overlapped Buffer.BlockCopy: 2158.49 ms
    -------------------------
    Block size: 64 bytes
    Non-overlapped Array.Copy: 705.76 ms
    Non-overlapped Buffer.BlockCopy: 390.63 ms
    Overlapped Array.Copy: 1303.00 ms
    Overlapped Buffer.BlockCopy: 1103.89 ms
    -------------------------
    Block size: 128 bytes
    Non-overlapped Array.Copy: 361.18 ms
    Non-overlapped Buffer.BlockCopy: 219.77 ms
    Overlapped Array.Copy: 620.21 ms
    Overlapped Buffer.BlockCopy: 577.20 ms
    -------------------------
    Block size: 256 bytes
    Non-overlapped Array.Copy: 192.92 ms
    Non-overlapped Buffer.BlockCopy: 108.71 ms
    Overlapped Array.Copy: 347.63 ms
    Overlapped Buffer.BlockCopy: 353.40 ms
    -------------------------
    Block size: 512 bytes
    Non-overlapped Array.Copy: 104.69 ms
    Non-overlapped Buffer.BlockCopy: 65.65 ms
    Overlapped Array.Copy: 211.77 ms
    Overlapped Buffer.BlockCopy: 202.94 ms
    -------------------------
    Block size: 1024 bytes
    Non-overlapped Array.Copy: 52.93 ms
    Non-overlapped Buffer.BlockCopy: 38.84 ms
    Overlapped Array.Copy: 144.39 ms
    Overlapped Buffer.BlockCopy: 154.09 ms
    -------------------------
    Block size: 2048 bytes
    Non-overlapped Array.Copy: 45.64 ms
    Non-overlapped Buffer.BlockCopy: 30.11 ms
    Overlapped Array.Copy: 118.33 ms
    Overlapped Buffer.BlockCopy: 109.16 ms
    -------------------------
    Block size: 4096 bytes
    Non-overlapped Array.Copy: 30.93 ms
    Non-overlapped Buffer.BlockCopy: 30.72 ms
    Overlapped Array.Copy: 119.73 ms
    Overlapped Buffer.BlockCopy: 104.66 ms
    -------------------------
    Block size: 8192 bytes
    Non-overlapped Array.Copy: 30.37 ms
    Non-overlapped Buffer.BlockCopy: 26.63 ms
    Overlapped Array.Copy: 90.46 ms
    Overlapped Buffer.BlockCopy: 87.40 ms
    -------------------------
    

    x64 JIT上的结果:

    Block size: 16 bytes
    Non-overlapped Array.Copy: 1252.71 ms
    Non-overlapped Buffer.BlockCopy: 694.34 ms
    Overlapped Array.Copy: 701.27 ms
    Overlapped Buffer.BlockCopy: 573.34 ms
    -------------------------
    Block size: 32 bytes
    Non-overlapped Array.Copy: 995.47 ms
    Non-overlapped Buffer.BlockCopy: 654.70 ms
    Overlapped Array.Copy: 398.48 ms
    Overlapped Buffer.BlockCopy: 336.86 ms
    -------------------------
    Block size: 64 bytes
    Non-overlapped Array.Copy: 498.86 ms
    Non-overlapped Buffer.BlockCopy: 329.15 ms
    Overlapped Array.Copy: 218.43 ms
    Overlapped Buffer.BlockCopy: 179.95 ms
    -------------------------
    Block size: 128 bytes
    Non-overlapped Array.Copy: 263.00 ms
    Non-overlapped Buffer.BlockCopy: 196.71 ms
    Overlapped Array.Copy: 137.21 ms
    Overlapped Buffer.BlockCopy: 107.02 ms
    -------------------------
    Block size: 256 bytes
    Non-overlapped Array.Copy: 144.31 ms
    Non-overlapped Buffer.BlockCopy: 101.23 ms
    Overlapped Array.Copy: 85.49 ms
    Overlapped Buffer.BlockCopy: 69.30 ms
    -------------------------
    Block size: 512 bytes
    Non-overlapped Array.Copy: 76.76 ms
    Non-overlapped Buffer.BlockCopy: 55.31 ms
    Overlapped Array.Copy: 61.99 ms
    Overlapped Buffer.BlockCopy: 54.06 ms
    -------------------------
    Block size: 1024 bytes
    Non-overlapped Array.Copy: 44.01 ms
    Non-overlapped Buffer.BlockCopy: 33.30 ms
    Overlapped Array.Copy: 53.13 ms
    Overlapped Buffer.BlockCopy: 51.36 ms
    -------------------------
    Block size: 2048 bytes
    Non-overlapped Array.Copy: 27.05 ms
    Non-overlapped Buffer.BlockCopy: 25.57 ms
    Overlapped Array.Copy: 46.86 ms
    Overlapped Buffer.BlockCopy: 47.83 ms
    -------------------------
    Block size: 4096 bytes
    Non-overlapped Array.Copy: 29.11 ms
    Non-overlapped Buffer.BlockCopy: 25.12 ms
    Overlapped Array.Copy: 45.05 ms
    Overlapped Buffer.BlockCopy: 47.84 ms
    -------------------------
    Block size: 8192 bytes
    Non-overlapped Array.Copy: 24.95 ms
    Non-overlapped Buffer.BlockCopy: 21.52 ms
    Overlapped Array.Copy: 43.81 ms
    Overlapped Buffer.BlockCopy: 43.22 ms
    -------------------------