Linq.Max实现中的瓶颈是什么?

时间:2015-07-21 13:26:27

标签: c# performance linq

序言:我正在将一些代码(数组中的手动Max搜索)更改为一些Linq.Max()超级性感的书写行,这让我对性能提出了问题(我经常处理大数组)。所以我做了一个小程序来测试,因为我只信任我所看到的并得到了这个结果:

The size is now of 1 elements
With the loop it took:  00:00:00.0000015 
With Linq it took:      00:00:00.0000288 
The loop is faster: 94,79%
-----------------------------------------
The size is now of 10 elements
With the loop it took:  00:00:00 
With Linq it took:      00:00:00.0000007 
The loop is faster: 100,00%
-----------------------------------------
The size is now of 100 elements
With the loop it took:  00:00:00 
With Linq it took:      00:00:00.0000011 
The loop is faster: 100,00%
-----------------------------------------
The size is now of 1 000 elements
With the loop it took:  00:00:00.0000003 
With Linq it took:      00:00:00.0000078 
The loop is faster: 96,15%
-----------------------------------------
The size is now of 10 000 elements
With the loop it took:  00:00:00.0000063 
With Linq it took:      00:00:00.0000765 
The loop is faster: 91,76%
-----------------------------------------
The size is now of 100 000 elements
With the loop it took:  00:00:00.0000714 
With Linq it took:      00:00:00.0007602 
The loop is faster: 90,61%
-----------------------------------------
The size is now of 1 000 000 elements
With the loop it took:  00:00:00.0007669 
With Linq it took:      00:00:00.0081737 
The loop is faster: 90,62%
-----------------------------------------
The size is now of 10 000 000 elements
With the loop it took:  00:00:00.0070811 
With Linq it took:      00:00:00.0754348 
The loop is faster: 90,61%
-----------------------------------------
The size is now of 100 000 000 elements
With the loop it took:  00:00:00.0788133 
With Linq it took:      00:00:00.7758791 
The loop is faster: 89,84%

简而言之,Linq慢了近10倍,这让我感到困扰,所以我看了implementation of Max()

public static int Max(this IEnumerable<int> source) {
    if (source == null) throw Error.ArgumentNull("source");
    int value = 0;
    bool hasValue = false;
    foreach (int x in source) {
        if (hasValue) {
            if (x > value) value = x;
        }
        else {
            value = x;
            hasValue = true;
        }
    }
    if (hasValue) return value;
    throw Error.NoElements();
}

正如标题已经提出的那样,这个实现中的内容会让它慢10倍? (而且不是ForEach,我已经检查过了)

编辑:

当然,我在发布模式下测试。

这是我的测试代码(没有输出):

//----------------
private int[] arrDoubles;
//----------------

Stopwatch watch = new Stopwatch();
//Stop a 100 Millions to avoid memory overflow on my laptop
for (int i = 1; i <= 100000000; i = i * 10)
{
    fillArray(i);
    watch.Restart();
    int max = Int32.MinValue; // Reset
    for (int j = 0; j < arrDoubles.Length; j++)
    {
        max = Math.Max(arrDoubles[j], max);
    }
    watch.Stop();

    TimeSpan loopSpan = watch.Elapsed;

    watch.Restart();
    max = Int32.MinValue; // Reset
    max = arrDoubles.Max();
    watch.Stop();

    TimeSpan linqSpan = watch.Elapsed;
}

//-------------------------------------------

private void fillArray(int nbValues)
{
    int Min = Int32.MinValue;
    int Max = Int32.MaxValue;
    Random randNum = new Random();
    arrDoubles = Enumerable.Repeat(0, nbValues).Select(i => randNum.Next(Min, Max)).ToArray();
}

2 个答案:

答案 0 :(得分:10)

这可能会发生,因为通过IEnumerable<>访问数组比从实际数组类型访问要慢得多(即使使用foreach)。

以下代码演示了这一点。请注意max1()max2()中的代码是如何相同的;唯一的区别是array参数的类型。两种方法在测试期间都传递给同一个对象。

尝试从RELEASE版本运行它(而不是在调试器下运行它,即使是发布版本也会启用调试代码):

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace Demo
{
    public class Program
    {
        private static void Main(string[] args)
        {
            var array = new int[100000000];

            var sw = new Stopwatch();

            for (int trial = 0; trial < 8; ++trial)
            {
                sw.Restart();
                for (int i = 0; i < 10; ++i)
                    max1(array);
                var elapsed1 = sw.Elapsed;
                Console.WriteLine("int[] took " + elapsed1);

                sw.Restart();
                for (int i = 0; i < 10; ++i)
                    max2(array);
                var elapsed2 = sw.Elapsed;
                Console.WriteLine("IEnumerable<int> took " + elapsed2);

                Console.WriteLine("\nFirst method was {0} times faster.\n", elapsed2.TotalSeconds / elapsed1.TotalSeconds);
            }
        }

        private static int max1(int[] array)
        {
            int result = int.MinValue;

            foreach (int n in array)
                if (n > result)
                    result = n;

            return result;
        }

        private static int max2(IEnumerable<int> array)
        {
            int result = int.MinValue;

            foreach (int n in array)
                if (n > result)
                    result = n;

            return result;
        }
    }
}

在我的电脑上,int[]版本比IEnumerable<int>版本快10倍左右。

答案 1 :(得分:2)

比较两个整数是在任何比例(只有几个周期)。但是,枚举需要更长的时间,因此您可以说LINQ花费的时间比计算更多 - 而您的代码可以直接访问数组中的任何数字。