C#方法要慢100倍,而三个返回却要两个?

时间:2018-09-09 06:39:02

标签: c# performance optimization

当我尝试对其进行性能测试时,我使用的方法会产生一些奇怪的行为,基本上,如果我注释掉/禁用了其中一个if语句之一的返回值,则它会从400ms变为4ms ,几乎就像正在将其编译掉,而不是实际运行代码一样,如果在注释/禁用一个返回之后,仅返回true或false,那么它只有一个选项,那么我可以看到编译器如何会对其进行优化,并始终将其设置为bool,而不是运行代码。

任何人都知道可能会发生什么,或者对如何更好地运行测试有建议吗?

我的测试代码:

Vec3 spherePos = new Vec3(43.7527, 75.9756, 0);
double sphereRadisSq = 50 * 50;
Vec3 rayPos = new Vec3(-5.32301, 5.97157, -112.983);
Vec3 rayDir = new Vec3(0.457841, 0.680324, 0.572312);

sw.Reset();
sw.Start();
bool res = false;
for (int i = 0; i < 10000000; i++)
{
   res = Intersect.RaySphereFast(rayPos, rayDir, spherePos, sphereRadisSq);
}      
sw.Stop();
Debug.Log($"testTime: {sw.ElapsedMilliseconds} ms");
Debug.Log(res);

和静态方法:

public static bool RaySphereFast(Vec3 _rp, Vec3 _rd, Vec3 _sp, double _srsq) 
{
    double rs = Vec3.DistanceFast(_rp, _sp);
    if (rs < _srsq)
    {
        return (true); // <-- When I disable this one
    }
    Vec3 p = Vec3.ProjectFast(_sp, _rp, _rd);
    double pr = Vec3.Dot(_rd, (p - _rp));
    if (pr < 0)
    {
        return (false); // <--  Or when I disable this one
    }
    double ps = Vec3.DistanceFast(p, _sp);
    if (ps < _srsq) 
    {
        return (true); // <--  Or when I disable this one
    }
    return (false);
}

Vec3结构(精简

public struct Vec3
{
    public Vec3(double _x, double _y, double _z)
    {
        x = _x;
        y = _y;
        z = _z;
    }

    public double x { get; }
    public double y { get; }
    public double z { get; }

    public static double DistanceFast(Vec3 _v0, Vec3 _v1) 
    {
        double x = (_v1.x - _v0.x);
        double y = (_v1.y - _v0.y);
        double z = (_v1.z - _v0.z);
        return ((x * x) + (y * y) + (z * z));
    }

    public static double Dot(Vec3 _v0, Vec3 _v1)
    {
        return ((_v0.x * _v1.x) + (_v0.y * _v1.y) + (_v0.z * _v1.z));
    }

    public static Vec3 ProjectFast(Vec3 _p, Vec3 _a, Vec3 _d) 
    {
        Vec3 ap = _p - _a;
        return (_a + Vec3.Dot(ap, _d) * _d);
    }

    public static Vec3 operator +(Vec3 _v0, Vec3 _v1)
    {
        return (new Vec3(_v0.x + _v1.x, _v0.y + _v1.y, _v0.z + _v1.z));
    }

    public static Vec3 operator -(Vec3 _v0, Vec3 _v1)
    {
        return new Vec3(_v0.x - _v1.x, _v0.y - _v1.y, _v0.z - _v1.z);
    }

    public static Vec3 operator *(double _d1, Vec3 _v0)
    {
        return new Vec3(_d1 * _v0.x, _d1 * _v0.y, _d1 * _v0.z);
    }
}

3 个答案:

答案 0 :(得分:5)

这很可能发生,因为当您注释掉收益时,该方法的复杂性下降到了禁用自动内联的阈值以下。

此内联在生成的IL中不可见-由JIT编译器完成。

我们可以通过使用<html> <head> <title>Material Design Components</title> <link rel="stylesheet" href="style.css"> <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Product+Sans%3A100%2C100i%2C300%2C300i%2C400%2C400i%2C500%2C500i%2C700%2C700i%2C900%2C900i"> </head> <body> <nav> <img class="menu-btn" src="menu.png" alt=""> <img class="logo" src="google.png" alt=""> </nav> <div class="side-menu"> <a class="active" href="">Learn More</a> <a href="">Link 1</a> <a class="active" href="">This is a long long long long long link</a> <a href="">!@#$%^&*()</a> <a href="">Wow</a> </div> <header> <h1>Material Design Components</h1> <p>Work in progress.</p> </header> <div class="content"> <h2>Things to Do:</h2> <ul> <li>Menus</li> <ul> <li>Share</li> <li>Three Dot</li> </ul> <li>Lightbox</li> </ul> <h2>Buttons</h2> <div class="display"> <button class="store-button"> <img src="cart.png" alt=""> <span>Add to Cart</span> </button> <button> <span>Learn more</span> </button> <button class="image-only"> <img src="cart.png" alt=""> </button> <button class="card-button"> <span>READ MORE</span> </button> </div> <h2>Menus</h2> <h2>Card UI</h2> <div class="display card-display"> <div class="card"> <div class="card-image"> <img src="https://source.unsplash.com/random/sig1" alt=""> <h1>October</h1> </div> <div class="card-content"> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sagittis pellentesque lacus eleifend lacinia...</p> <div class="card-button-container"> <button class="card-button"> <span>READ MORE</span> </button> </div> </div> </div> <div class="card-open"> <div class="card-open-header"> <img class="close-button"src="close.png" alt=""> <img class="header-image" src="https://source.unsplash.com/random/sig2" alt=""> <h1>October</h1> </div> <div class="card-open-content"> <h1>Why is October the best month?</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sagittis pellentesque lacus eleifend lacinia...</p> <div class="card-image-display"> <img src="https://source.unsplash.com/random/sig3" alt=""> <img src="https://source.unsplash.com/random/sig4" alt=""> <img src="https://source.unsplash.com/random/sig5" alt=""> </div> </div> </div> <div class="card"> <div class="card-image"> <img src="https://source.unsplash.com/random/sig6" alt=""> <h1>March</h1> </div> <div class="card-content"> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sagittis pellentesque lacus eleifend lacinia...</p> <div class="card-button-container"> <button class="card-button"> <span>READ MORE</span> </button> </div> </div> </div> <div class="card-open"> <div class="card-open-header"> <img class="close-button"src="close.png" alt=""> <img class="header-image" src="https://source.unsplash.com/random/sig7" alt=""> <h1>March</h1> </div> <div class="card-open-content"> <h1>Why is March the best month?</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sagittis pellentesque lacus eleifend lacinia...</p> <div class="card-image-display"> <img src="https://source.unsplash.com/random/sig8" alt=""> <img src="https://source.unsplash.com/random/sig9" alt=""> <img src="https://source.unsplash.com/random/sig10" alt=""> </div> </div> </div> <div class="card"> <div class="card-image"> <img src="https://source.unsplash.com/random/sig11" alt=""> <h1>November</h1> </div> <div class="card-content"> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sagittis pellentesque lacus eleifend lacinia...</p> <div class="card-button-container"> <button class="card-button"> <span>READ MORE</span> </button> </div> </div> </div> <div class="card-open"> <div class="card-open-header"> <img class="close-button"src="close.png" alt=""> <img class="header-image" src="https://source.unsplash.com/random/sig12" alt=""> <h1>November</h1> </div> <div class="card-open-content"> <h1>Why is November the best month?</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sagittis pellentesque lacus eleifend lacinia...</p> <div class="card-image-display"> <img src="https://source.unsplash.com/random/sig13" alt=""> <img src="https://source.unsplash.com/random/sig14" alt=""> <img src="https://source.unsplash.com/random/sig15" alt=""> </div> </div> </div> <div class="card"> <div class="card-image"> <img src="https://source.unsplash.com/random/sig16" alt=""> <h1>July</h1> </div> <div class="card-content"> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sagittis pellentesque lacus eleifend lacinia...</p> <div class="card-button-container"> <button class="card-button"> <span>READ MORE</span> </button> </div> </div> </div> <div class="card-open"> <div class="card-open-header"> <img class="close-button"src="close.png" alt=""> <img class="header-image" src="https://source.unsplash.com/random/sig17" alt=""> <h1>July</h1> </div> <div class="card-open-content"> <h1>Why is July the best month?</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sagittis pellentesque lacus eleifend lacinia...</p> <div class="card-image-display"> <img src="https://source.unsplash.com/random/sig19" alt=""> <img src="https://source.unsplash.com/random/sig120" alt=""> <img src="https://source.unsplash.com/random/sig121" alt=""> </div> </div> </div> </div> </div> <script src="js.js"></script> </body> </html>属性修饰所讨论的方法来检验该假设。

当我用您的代码尝试此操作时,我得到了以下结果(发行版,x64构建):

[MethodImpl(MethodImplOptions.AggressiveInlining)]

注释掉第一个返回的时间与用Original code: 302 ms First return commented out: 2 ms Decorated with AggressiveInlining: 2 ms 装饰该方法(使第一个返回启用)的时间相同。

因此,我得出的结论是正确的。

答案 1 :(得分:1)

只需在@Matthew Watson的答案中添加(明显的)免责声明

结果取决于.NET版本,JIT版本等。 仅供参考,我无法重现这种差异,结果在我的环境中也可以得出相当的结果。

我将benchmarkDotNet与.NET Core 2.1.0结合使用,请参见下面的详细信息

// * Summary *

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-4700MQ CPU 2.40GHz (Max: 1.08GHz) (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=2338346 Hz, Resolution=427.6527 ns, Timer=TSC
.NET Core SDK=2.2.100-preview1-009349
  [Host]     : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT


                 Method |     Mean |     Error |    StdDev |
----------------------- |---------:|----------:|----------:|
 RaySphereFast_Original | 40.06 ns | 0.3693 ns | 0.3455 ns |
 RaySphereFast_NoReturn | 40.46 ns | 0.0860 ns | 0.0805 ns |

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  1 ns   : 1 Nanosecond (0.000000001 sec)

// ***** BenchmarkRunner: End *****
Run time: 00:00:34 (34.86 sec), executed benchmarks: 2

// * Artifacts cleanup *

答案 2 :(得分:1)

这里发生了一些有趣的事情。正如其他人指出的那样,当您注释掉其中一项收益时,方法RaySphereFast现在变得很小,可以内联,实际上jit决定内联它。这反过来会内联它调用的所有辅助方法。结果,循环主体最终没有调用。

一旦发生jit,然后“ struct提升”各种Vec3实例,并且由于您已使用常量初始化了所有字段,因此jit会传播这些常量并在各种操作中将其折叠。因此,Jit意识到调用的结果将始终为true

由于循环的每次迭代都返回相同的值,因此jit意识到循环中的这些计算实际上都不是必需的(因为已知结果)并将其全部删除。因此,在“快速”版本中,您正在计时一个空循环:

G_M52940_IG04:
       BF01000000           mov      edi, 1
       FFC1                 inc      ecx
       81F980969800         cmp      ecx, 0x989680
       7CF1                 jl       SHORT G_M52940_IG04

在“慢速”版本中,呼叫不会内联,并且此优化均无法启动:

G_M32193_IG04:
       488D4C2478           lea      rcx, bword ptr [rsp+78H]
       C4617B1109           vmovsd   qword ptr [rcx], xmm9
       C4617B115108         vmovsd   qword ptr [rcx+8], xmm10
       C4617B115910         vmovsd   qword ptr [rcx+16], xmm11
       488D4C2460           lea      rcx, bword ptr [rsp+60H]
       C4617B1121           vmovsd   qword ptr [rcx], xmm12
       C4617B116908         vmovsd   qword ptr [rcx+8], xmm13
       C4617B117110         vmovsd   qword ptr [rcx+16], xmm14
       488D4C2448           lea      rcx, bword ptr [rsp+48H]
       C4E17B1131           vmovsd   qword ptr [rcx], xmm6
       C4E17B117908         vmovsd   qword ptr [rcx+8], xmm7
       C4617B114110         vmovsd   qword ptr [rcx+16], xmm8
       488D4C2478           lea      rcx, bword ptr [rsp+78H]
       488D542460           lea      rdx, bword ptr [rsp+60H]
       4C8D442448           lea      r8, bword ptr [rsp+48H]
       C4E17B101D67010000   vmovsd   xmm3, qword ptr [reloc @RWD64]
       E8D2F8FFFF           call     X:RaySphereFast(struct,struct,struct,double):bool
       8BD8                 mov      ebx, eax
       FFC7                 inc      edi
       81FF80969800         cmp      edi, 0x989680
       7C95                 jl       SHORT G_M32193_IG04

如果您真的想对RaySphereFast的速度进行基准测试,请确保在每次迭代时都使用不同或非常量的参数来调用它,并确保使用每次迭代的结果。