Question

有3个代码执行相同的操作，但它们的性能在x64版本中有所不同。

我想这是因为分支预测。任何人都可以进一步阐述吗？

有条件：需要41毫秒

for (int j = 0; j < 10000; j++)
{
    ret = (j * 11 / 3 % 5) + (ret % 11 == 4 ? 2 : 1);
}

正常：需要51毫秒

for (int j = 0; j < 10000; j++)
{
    if (ret % 11 == 4)
    {
        ret = 2 + (j * 11 / 3 % 5);
    }
    else
    {
        ret = 1 + (j * 11 / 3 % 5);
    }
}

缓存：需要44毫秒

for (int j = 0; j < 10000; j++)
{
    var tmp = j * 11 / 3 % 5;
    if (ret % 11 == 4)
    {
        ret = 2 + tmp;
    }
    else
    {
        ret = 1 + tmp;
    }
}

Answer 1

编辑3 如果我在修正定时误差的情况下返回原始测试，我会得到类似的输出。

有条件的花了67毫秒

正常需要83毫秒

缓存耗时73毫秒

这表明在for循环中，三元/条件运算符可以稍微快一些。鉴于之前的发现，当逻辑分支从循环中抽象出来时，if块击败了Ternary / Conditional运算符，我们可以推断当使用Conditional / Ternary运算符时，编译器能够进行额外的优化迭代地，至少在某些情况下。

我不清楚为什么这些优化不适用于标准if块，也不适用于标准Stopwatch块。我认为，实际的差别是相当小的，这是一个有争议的问题。

编辑2

实际上有glaring error in my test code highlighted here

当我使用Stopwatch.Restart代替Stopwatch.Start并且迭代次数达到1000000000时，调用之间不会重置if，我会得到结果

有条件的花了22404ms

正常需要21403ms

这更像是我期望并通过提取的CIL得到的结果。因此，当与周围代码隔离时，“正常”if实际上比Ternary \ Conditional运算符稍快一些。

修改

在我下面的调查之后，我建议当使用逻辑条件在两个常量或文字之间进行选择时，条件/三元运算符可以显着比标准if块更快。 ~~在我的测试中，它大约快了两倍。~~

~~但是我无法理解为什么。~~正常using System.Diagnostics; class Program { static void Main() { var stopwatch = new Stopwatch(); var conditional = Conditional(10); var normal = Normal(10); var cached = Cached(10); if (new[] { conditional, normal }.Any(x => x != cached)) { throw new Exception(); } stopwatch.Start(); conditional = Conditional(10000000); stopWatch.Stop(); Console.WriteLine( "Conditional took {0}ms", stopwatch.ElapsedMilliseconds); ////stopwatch.Start(); incorrect stopwatch.Restart(); normal = Normal(10000000); stopWatch.Stop(); Console.WriteLine( "Normal took {0}ms", stopwatch.ElapsedMilliseconds); ////stopwatch.Start(); incorrect stopwatch.Restart(); cached = Cached(10000000); stopWatch.Stop(); Console.WriteLine( "Cached took {0}ms", stopwatch.ElapsedMilliseconds); if (new[] { conditional, normal }.Any(x => x != cached)) { throw new Exception(); } Console.ReadKey(); } static int Conditional(int iterations) { var ret = 0; for (int j = 0; j < iterations; j++) { ret = (j * 11 / 3 % 5) + (ret % 11 == 4 ? 2 : 1); } return ret; } static int Normal(int iterations) { var ret = 0; for (int j = 0; j < iterations; j++) { if (ret % 11 == 4) { ret = 2 + (j * 11 / 3 % 5); } else { ret = 1 + (j * 11 / 3 % 5); } } return ret; } static int Cached(int iterations) { var ret = 0; for (int j = 0; j < iterations; j++) { var tmp = j * 11 / 3 % 5; if (ret % 11 == 4) { ret = 2 + tmp; } else { ret = 1 + tmp; } } return ret; } }生成的CIL更长，但对于这两个函数，平均执行路径似乎是六行，包括3个负载和1或2跳~~，任何想法？~~。

使用此代码，

Conditional

~~在x64发布模式下编译，带有优化，无需附加调试器即可运行。我得到了这个输出，~~


有条件的花了65毫秒

正常需要148毫秒

缓存耗时217毫秒

~~并且不会抛出任何异常。~~

使用ILDASM反汇编代码我可以确认三种方法的CIL不同，static int Conditional(bool condition, int value) { return value + (condition ? 2 : 1); } static int Normal(bool condition, int value) { if (condition) { return 2 + value; } return 1 + value; }方法的代码有点短。

要真正回答“为什么”的问题，我需要了解编译器的代码。我可能需要知道为什么编译器是这样编写的。

你可以进一步细分甚至，这样你实际上只需比较逻辑函数并忽略所有其他活动。

static int Looper(int iterations, Func<bool, int, int> operation) { var ret = 0; for (var j = 0; j < iterations; j++) { var condition = ret % 11 == 4; var value = ((j * 11) / 3) % 5; ret = operation(condition, value); } }

您可以使用
进行迭代
... Conditional ... { : ldarg.1 // push second arg : ldarg.0 // push first arg : brtrue.s T // if first arg is true jump to T : ldc.i4.1 // push int32(1) : br.s F // jump to F T: ldc.i4.2 // push int32(2) F: add // add either 1 or 2 to second arg : ret // return result } ... Normal ... { : ldarg.0 // push first arg : brfalse.s F // if first arg is false jump to F : ldc.i4.2 // push int32(2) : ldarg.1 // push second arg : add // add second arg to 2 : ret // return result F: ldc.i4.1 // push int32(1) : ldarg.1 // push second arg : add // add second arg to 1 : ret // return result }

此测试仍显示性能差异但现在反过来，简化了IL。

{{1}}

Answer 2

有3个代码执行相同的操作，但性能不同

这不是那么令人惊讶，是吗？写一些不同的东西，你得到不同的时间。

我想这是因为分支预测。

这可以解释一个部分，为什么第一个片段更快。但请注意?:仍在分支另外需要注意的是，它只是一个大表达式，是优化器的理想区域。

问题在于你无法查看这样的代码并得出某个运算符更快/更慢的结论。周围的代码至少同样重要。

为什么这种使用条件运算符的方式会提供比正常if / else更好的性能

2 个答案: