C#微型基准测试:为什么重置聚合值会使for循环更快?

时间:2018-11-22 13:42:31

标签: c# microbenchmark

请考虑以下两个不同的功能ComputeAComnputeB

using System;
using System.Diagnostics;

namespace BenchmarkLoop
{
    class Program
    {
        private static double[] _dataRow;
        private static double[] _dataCol;

        public static double ComputeA(double[] col, double[] row)
        {
            var rIdx = 0;
            var value = 0.0;

            for (var i = 0; i < col.Length; ++i)
            {
                for (var cIdx = 0; cIdx < col.Length; ++cIdx, ++rIdx)
                    value += col[cIdx] * row[rIdx];
            }

            return value;
        }

        public static double ComputeB(double[] col, double[] row)
        {
            var rIdx = 0;
            var value = 0.0;

            for (var i = 0; i < col.Length; ++i)
            {
                value = 0.0;
                for (var cIdx = 0; cIdx < col.Length; ++cIdx, ++rIdx)
                    value += col[cIdx] * row[rIdx];
            }

            return value;
        }

        public static double ComputeC(double[] col, double[] row)
        {
            var rIdx = 0;
            var value = 0.0;

            for (var i = 0; i < col.Length; ++i)
            {
                var tmp = 0.0;
                for (var cIdx = 0; cIdx < col.Length; ++cIdx, ++rIdx)
                    tmp += col[cIdx] * row[rIdx];
                value += tmp;
            }

            return value;
        }

        static void Main(string[] args)
        {
            _dataRow = new double[2500];
            _dataCol = new double[50];

            var random = new Random();
            for (var i = 0; i < _dataRow.Length; i++)            
                _dataRow[i] = random.NextDouble();
            for (var i = 0; i < _dataCol.Length; i++)
                _dataCol[i] = random.NextDouble();

            var nRuns = 1000000;

            var stopwatch = new Stopwatch();
            stopwatch.Start();
            for (var i = 0; i < nRuns; i++)
                ComputeA(_dataCol, _dataRow);
            stopwatch.Stop();
            var t0 = stopwatch.ElapsedMilliseconds;

            stopwatch.Reset();
            stopwatch.Start();
            for (int i = 0; i < nRuns; i++)
                ComputeC(_dataCol, _dataRow);
            stopwatch.Stop();
            var t1 = stopwatch.ElapsedMilliseconds;

            Console.WriteLine($"Time ComputeA: {t0} - Time ComputeC: {t1}");
            Console.ReadKey();
        }
    }
}

它们的区别仅在于每次调用内部循环之前变量值的“重置”。我已经运行了几种不同类型的基准测试,所有这些基准测试均启用了“优化代码”,32位和64位以及不同大小的数据数组。总是ComputeB快25%。我也可以使用BenchmarkDotNet重现这些结果。但是我无法解释它们。任何想法?我还使用Intel VTune Amplifier 2019检查了生成的汇编代码:对于这两个函数,JIT结果完全相同,外加用于重置value的额外行: JIT result 因此,在汇编程序级别上,没有任何魔术可以使代码更快。关于此效果还有其他可能的解释吗?以及如何验证?

这是BenchmarkDotNet的结果(参数N的大小为_dataCol_dataRow的大小始终为N ^ 2): BenchmarkDotNet results

以及用于比较ComputeAComputeC的结果: enter image description here

ComputeA(左)和ComputeC(右)的JIT组件: enter image description here

差异非常小:在块2中,变量tmp设置为0(存储在寄存器xmml中),在块6中,tmp为已添加到返回结果value中。因此,总体而言,不足为奇。只是运行时是魔术;)

0 个答案:

没有答案