So I put it to the test.

Question

我有一个班级，像这样：

public class MyClass
{
    public int Value { get; set; }
    public bool IsValid { get; set; }
}

_{实际上它要大得多，但这会重现问题（怪异）。}

我想获得实例有效的Value的总和。到目前为止，我已经找到了两个解决方案。

第一个是：

int result = myCollection.Where(mc => mc.IsValid).Select(mc => mc.Value).Sum();

然而，第二个是：

int result = myCollection.Select(mc => mc.IsValid ? mc.Value : 0).Sum();

我想获得最有效的方法。起初，我认为第二个会更有效率。然后我的理论部分开始“嗯，一个是O（n + m + m），另一个是O（n + n）。第一个应该用更多的残疾人表现更好，而第二个应该表现更好少用“。我以为他们会表现得相同。编辑：然后@Martin指出Where和Select结合在一起，所以实际应该是O（m + n）。但是，如果你看下面，似乎这没有关系。

So I put it to the test.

_{（这是100多行，所以我认为最好将其作为Gist发布。）}
结果很有意思。

_{0％平局容差：}

衡量标准有利于Select和Where，约为30分。

How much do you want to be the disambiguation percentage? 0 Starting benchmarking. Ties: 0 Where + Select: 65 Select: 36

_{2％领带公差：}

它是相同的，除了对于一些人，他们在2％之内。我会说这是最小的误差幅度。 Select和Where现在只有20分左右的领先优势。

How much do you want to be the disambiguation percentage? 2 Starting benchmarking. Ties: 6 Where + Select: 58 Select: 37

_{5％领带公差：}

这就是我所说的最大误差范围。它使Select更好一些，但不多。

How much do you want to be the disambiguation percentage? 5 Starting benchmarking. Ties: 17 Where + Select: 53 Select: 31

_{10％领带公差：}

这超出了我的误差范围，但我仍然对结果感兴趣。因为它给了Select和Where 20分的领先优势，现在已经有了一段时间了。

How much do you want to be the disambiguation percentage? 10 Starting benchmarking. Ties: 36 Where + Select: 44 Select: 21

_{25％的领带承受能力：}

这样，方式超出了我的误差范围，但我仍然对结果感兴趣，因为Select和Where 仍然< / strong>（差点）保持20分的领先优势。看起来它只是在极少数情况下胜过它，而这就是它的主导地位。

How much do you want to be the disambiguation percentage? 25 Starting benchmarking. Ties: 85 Where + Select: 16 Select: 0

现在，我猜测20分的领先位置来自中间位置，他们必然会以相同的表现获得。我可以尝试记录它，但这将是一大堆信息。我想，图表会更好。

这就是我所做的。

它显示Select行保持稳定（预期）并且Select + Where行攀升（预期）。然而，让我感到困惑的是为什么它在50或更早的时候不符合Select：实际上我期望早于50，因为必须为Select和{创建额外的枚举器{1}}。我的意思是，这显示了20分的领先优势，但它没有解释原因。我想这是我的问题的主要观点。

为什么它会像这样？我应该相信吗？如果没有，我应该使用另一个还是这个？

正如@KingKong在评论中提到的那样，你也可以使用带有lambda的Where重载。所以我的两个选项现在改为：

首先：

Sum

第二

int result = myCollection.Where(mc => mc.IsValid).Sum(mc => mc.Value);

我要缩短它，但是：

int result = myCollection.Sum(mc => mc.IsValid ? mc.Value : 0);

二十分领先优势仍然存在，这意味着它与@Marcin在评论中指出的How much do you want to be the disambiguation percentage? 0 Starting benchmarking. Ties: 0 Where: 60 Sum: 41 How much do you want to be the disambiguation percentage? 2 Starting benchmarking. Ties: 8 Where: 55 Sum: 38 How much do you want to be the disambiguation percentage? 5 Starting benchmarking. Ties: 21 Where: 49 Sum: 31 How much do you want to be the disambiguation percentage? 10 Starting benchmarking. Ties: 39 Where: 41 Sum: 21 How much do you want to be the disambiguation percentage? 25 Starting benchmarking. Ties: 85 Where: 16 Sum: 0和Where组合无关。

_{感谢您阅读我的文字墙！此外，如果您有兴趣，here's记录Excel所接受的CSV的修改版本。}

Answer 1

Select在整个集合上迭代一次，对于每个项目，执行条件分支（检查有效性）和+操作。

Where+Select创建一个跳过无效元素的迭代器（不yield），仅对有效项执行+。

因此，Select的费用为：

t(s) = n * ( cost(check valid) + cost(+) )

对于Where+Select：

t(ws) = n * ( cost(check valid) + p(valid) * (cost(yield) + cost(+)) )

其中：

p(valid)是列表中的项目有效的概率。
cost(check valid)是检查有效性的分支机构的费用
cost(yield)是构造where迭代器的新状态的代价，它比Select版本使用的简单迭代器更复杂。

如您所见，对于给定的n，Select版本是常量，而Where+Select版本是以p(valid)为变量的线性等式。成本的实际值决定了两条线的交点，由于cost(yield)可能与cost(+)不同，因此它们不一定在p(valid) = 0.5处相交。

Answer 2

这是对导致时间差异的原因的深入解释。

Sum()的{{1}}函数如下所示：

IEnumerable<int>

在C＃中，public static int Sum(this IEnumerable<int> source) { int sum = 0; foreach(int item in source) { sum += item; } return sum; }只是.Net的迭代器版本的语法糖，IEnumerator<T> （不要与IEnumerable<T>混淆）。所以上面的代码实际上转化为：

foreach

请记住，您要比较的两行代码是以下

public static int Sum(this IEnumerable<int> source)
{
    int sum = 0;

    IEnumerator<int> iterator = source.GetEnumerator();
    while(iterator.MoveNext())
    {
        int item = iterator.Current;
        sum += item;
    }
    return sum;
}

现在这里是踢球者：

LINQ uses deferred execution。因此，虽然它可能出现 int result1 = myCollection.Where(mc => mc.IsValid).Sum(mc => mc.Value); int result2 = myCollection.Sum(mc => mc.IsValid ? mc.Value : 0);遍历集合两次，它实际上只迭代它一次。 result1条件实际应用在Where()期间，Sum() 的调用内部（这可能归功于yield return 的魔力。

这意味着，对于MoveNext()，result1循环内的代码，

while

仅针对{ int item = iterator.Current; sum += item; }的每个项目执行一次。相比之下，mc.IsValid == true将为集合中的每个项执行该代码。这就是result2通常更快的原因。

（但请注意，在result1内调用Where()条件仍有一些小开销，因此如果大多数/所有项目都有MoveNext()，{{1}实际上会更快！）

希望现在很明显为什么mc.IsValid == true通常较慢。现在我想解释为什么我在评论中说明这些LINQ性能比较无关紧要。

创建LINQ表达式很便宜。调用委托函数很便宜。在迭代器上分配和循环很便宜。但是不做这些事情甚至更便宜。因此，如果您发现LINQ语句是程序中的瓶颈，根据我的经验，在没有LINQ的情况下重写它将始终使其比任何各种LINQ方法更快。

因此，您的LINQ工作流程应如下所示：

到处使用LINQ。
配置。
如果分析器说LINQ是造成瓶颈的原因，请在没有LINQ的情况下重写该代码。

幸运的是，LINQ瓶颈很少见。哎呀，瓶颈很少见。在过去的几年里，我写了数百个LINQ语句，最终取代了<1％。大多数那些是由于LINQ2EF的SQL优化不佳，而不是LINQ的错误。

因此，像往常一样，首先编写清晰明了的代码，然后等到之后，你已经分析过要担心微观优化。

Answer 3

有趣的是。你知道Sum(this IEnumerable<TSource> source, Func<TSource, int> selector)是如何定义的吗？ 它使用Select方法！

public static int Sum<TSource>(this IEnumerable<TSource> source, Func<TSource, int> selector)
{
    return source.Select(selector).Sum();
}

实际上，它应该几乎一样。我自己做了快速研究，结果如下：

Where -- mod: 1 result: 0, time: 371 ms
WhereSelect -- mod: 1  result: 0, time: 356 ms
Select -- mod: 1  result 0, time: 366 ms
Sum -- mod: 1  result: 0, time: 363 ms
-------------
Where -- mod: 2 result: 4999999, time: 469 ms
WhereSelect -- mod: 2  result: 4999999, time: 429 ms
Select -- mod: 2  result 4999999, time: 362 ms
Sum -- mod: 2  result: 4999999, time: 358 ms
-------------
Where -- mod: 3 result: 9999999, time: 441 ms
WhereSelect -- mod: 3  result: 9999999, time: 452 ms
Select -- mod: 3  result 9999999, time: 371 ms
Sum -- mod: 3  result: 9999999, time: 380 ms
-------------
Where -- mod: 4 result: 7500000, time: 571 ms
WhereSelect -- mod: 4  result: 7500000, time: 501 ms
Select -- mod: 4  result 7500000, time: 406 ms
Sum -- mod: 4  result: 7500000, time: 397 ms
-------------
Where -- mod: 5 result: 7999999, time: 490 ms
WhereSelect -- mod: 5  result: 7999999, time: 477 ms
Select -- mod: 5  result 7999999, time: 397 ms
Sum -- mod: 5  result: 7999999, time: 394 ms
-------------
Where -- mod: 6 result: 9999999, time: 488 ms
WhereSelect -- mod: 6  result: 9999999, time: 480 ms
Select -- mod: 6  result 9999999, time: 391 ms
Sum -- mod: 6  result: 9999999, time: 387 ms
-------------
Where -- mod: 7 result: 8571428, time: 489 ms
WhereSelect -- mod: 7  result: 8571428, time: 486 ms
Select -- mod: 7  result 8571428, time: 384 ms
Sum -- mod: 7  result: 8571428, time: 381 ms
-------------
Where -- mod: 8 result: 8749999, time: 494 ms
WhereSelect -- mod: 8  result: 8749999, time: 488 ms
Select -- mod: 8  result 8749999, time: 386 ms
Sum -- mod: 8  result: 8749999, time: 373 ms
-------------
Where -- mod: 9 result: 9999999, time: 497 ms
WhereSelect -- mod: 9  result: 9999999, time: 494 ms
Select -- mod: 9  result 9999999, time: 386 ms
Sum -- mod: 9  result: 9999999, time: 371 ms

以下实施：

result = source.Where(x => x.IsValid).Sum(x => x.Value);
result = source.Select(x => x.IsValid ? x.Value : 0).Sum();
result = source.Sum(x => x.IsValid ? x.Value : 0);
result = source.Where(x => x.IsValid).Select(x => x.Value).Sum();

mod表示：mod项中的每1项都无效：对于mod == 1，每个项目无效，mod == 2奇数项目无效，等等。集合包含{{ 1}}项目。

enter image description here

使用10000000项目进行收集的结果：

enter image description here

如您所见，100000000和Select结果在所有Sum值中都非常一致。但是，mod和where + where不是。

Answer 4

我的猜测是Where的版本过滤掉了0并且它们不是Sum的主题（即你没有执行添加）。这当然是一个猜测，因为我无法解释执行额外的lambda表达式和调用多个方法如何优于简单添加0。

我的一位朋友建议，由于溢出检查，总和中的0可能会导致严重的性能损失。看看它在未经检查的上下文中的表现会很有趣。

Answer 5

运行以下示例，我清楚地知道，唯一一次Where + Select可以胜过Select，实际上当它丢弃了列表中潜在项目的大量数据（在我的非正式测试中大约一半）时。在下面的小例子中，当Where跳过10mil的大约4mil项目时，我从两个样本中得到大致相同的数字。我在发布中运行，并重新排序执行where + select vs select，结果相同。

static void Main(string[] args)
        {
            int total = 10000000;
            Random r = new Random();
            var list = Enumerable.Range(0, total).Select(i => r.Next(0, 5)).ToList();
            for (int i = 0; i < 4000000; i++)
                list[i] = 10;

            var sw = new Stopwatch();
            sw.Start();

            int sum = 0;

            sum = list.Where(i => i < 10).Select(i => i).Sum();            

            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);

            sw.Reset();
            sw.Start();
            sum = list.Select(i => i).Sum();            

            sw.Stop();

            Console.WriteLine(sw.ElapsedMilliseconds);
        }

Answer 6

如果你需要速度，只做一个简单的循环可能是你最好的选择。而for往往比foreach更好（假设您的收藏当然是随机访问的。）

以下是10％的元素无效的时间：

Where + Select + Sum:   257
Select + Sum:           253
foreach:                111
for:                    61

有90％的无效元素：

Where + Select + Sum:   177
Select + Sum:           247
foreach:                105
for:                    58

这是我的基准代码......

public class MyClass {
    public int Value { get; set; }
    public bool IsValid { get; set; }
}

class Program {

    static void Main(string[] args) {

        const int count = 10000000;
        const int percentageInvalid = 90;

        var rnd = new Random();
        var myCollection = new List<MyClass>(count);
        for (int i = 0; i < count; ++i) {
            myCollection.Add(
                new MyClass {
                    Value = rnd.Next(0, 50),
                    IsValid = rnd.Next(0, 100) > percentageInvalid
                }
            );
        }

        var sw = new Stopwatch();
        sw.Restart();
        int result1 = myCollection.Where(mc => mc.IsValid).Select(mc => mc.Value).Sum();
        sw.Stop();
        Console.WriteLine("Where + Select + Sum:\t{0}", sw.ElapsedMilliseconds);

        sw.Restart();
        int result2 = myCollection.Select(mc => mc.IsValid ? mc.Value : 0).Sum();
        sw.Stop();
        Console.WriteLine("Select + Sum:\t\t{0}", sw.ElapsedMilliseconds);
        Debug.Assert(result1 == result2);

        sw.Restart();
        int result3 = 0;
        foreach (var mc in myCollection) {
            if (mc.IsValid)
                result3 += mc.Value;
        }
        sw.Stop();
        Console.WriteLine("foreach:\t\t{0}", sw.ElapsedMilliseconds);
        Debug.Assert(result1 == result3);

        sw.Restart();
        int result4 = 0;
        for (int i = 0; i < myCollection.Count; ++i) {
            var mc = myCollection[i];
            if (mc.IsValid)
                result4 += mc.Value;
        }
        sw.Stop();
        Console.WriteLine("for:\t\t\t{0}", sw.ElapsedMilliseconds);
        Debug.Assert(result1 == result4);

    }

}

顺便说一下，我同意Stilgar's guess：两个案例的相对速度因无效项目的百分比而异，原因很简单，因为Sum需要做的工作量在“Where”中有所不同情况下。

Answer 7

我不打算通过描述来解释，而是采取更多的数学方法。

鉴于下面的代码应该接近LINQ在内部所做的事情，相对成本如下：
仅选择：Nd + Na
其中+选择：Nd + Md + Ma

要弄清楚他们将要跨越的点，我们需要做一点代数：
Nd + Md + Ma = Nd + Na => M(d + a) = Na => (M/N) = a/(d+a)

这意味着为了使拐点达到50％，委托调用的成本必须与添加的成本大致相同。由于我们知道实际的拐点大约是60％，我们可以向后工作并确定@ItNotALie委托调用的成本实际上是加法成本的2/3，这是令人惊讶的，但这就是什么他的数字说。

static void Main(string[] args)
{
    var set = Enumerable.Range(1, 10000000)
                        .Select(i => new MyClass {Value = i, IsValid = i%2 == 0})
                        .ToList();

    Func<MyClass, int> select = i => i.IsValid ? i.Value : 0;
    Console.WriteLine(
        Sum(                        // Cost: N additions
            Select(set, select)));  // Cost: N delegate
    // Total cost: N * (delegate + addition) = Nd + Na

    Func<MyClass, bool> where = i => i.IsValid;
    Func<MyClass, int> wSelect = i => i.Value;
    Console.WriteLine(
        Sum(                        // Cost: M additions
            Select(                 // Cost: M delegate
                Where(set, where),  // Cost: N delegate
                wSelect)));
    // Total cost: N * delegate + M * (delegate + addition) = Nd + Md + Ma
}

// Cost: N delegate calls
static IEnumerable<T> Where<T>(IEnumerable<T> set, Func<T, bool> predicate)
{
    foreach (var mc in set)
    {
        if (predicate(mc))
        {
            yield return mc;
        }
    }
}

// Cost: N delegate calls
static IEnumerable<int> Select<T>(IEnumerable<T> set, Func<T, int> selector)
{
    foreach (var mc in set)
    {
        yield return selector(mc);
    }
}

// Cost: N additions
static int Sum(IEnumerable<int> set)
{
    unchecked
    {
        var sum = 0;
        foreach (var i in set)
        {
            sum += i;
        }

        return sum;
    }
}

Answer 8

我认为有趣的是MarcinJuraszek的结果与ItsNotALie不同。特别是，MarcinJuraszek的结果始于所有四个实现在同一个地方，而ItsNotALie的结果在中间交叉。我将从源头解释这是如何工作的。

我们假设总共有n个元素和m个有效元素。

Sum功能非常简单。它只是遍历枚举器： http://typedescriptor.net/browse/members/367300-System.Linq.Enumerable.Sum(IEnumerable%601)

为简单起见，我们假设该集合是一个列表。 Select和WhereSelect都会创建WhereSelectListIterator。这意味着生成的实际迭代器是相同的。在这两种情况下，都有Sum遍历迭代器WhereSelectListIterator。迭代器中最有趣的部分是MoveNext方法。

由于迭代器是相同的，所以循环是相同的。唯一的区别在于循环体。

这些lambda的身体成本非常相似。 where子句返回一个字段值，三元谓词也返回一个字段值。 select子句返回一个字段值，三元运算符的两个分支返回字段值或常量。组合的select子句将分支作为三元运算符，但WhereSelect使用MoveNext中的分支。

但是，所有这些操作都相当便宜。迄今为止最昂贵的操作是分支，其中错误的预测将花费我们。

此处另一项昂贵的操作是Invoke。正如Branko Dimitrijevic所示，调用函数需要比添加值更长的时间。

同样称重是Sum中的检查积累。如果处理器没有算术溢出标志，那么检查也可能是昂贵的。

因此，有趣的成本是：是：

（n + m）*调用+ m * checked+=
n *调用+ n * checked+=

因此，如果Invoke的成本远远高于检查累积的成本，那么案例2总是更好。如果它们是关于均匀的，那么当大约一半的元素有效时，我们将看到平衡。

看起来在MarcinJuraszek的系统上，检查+ =的成本可以忽略不计，但是对于在ItsNotALie和Branko Dimitrijevic的系统中，检查+ =有很高的成本。看起来它是在ItsNotALie系统上最昂贵的，因为盈亏平衡点要高得多。看起来没有人发布过积累成本高于Invoke的系统的结果。

为什么Where和Select表现优于Select？

第一个是：

然而，第二个是：

So I put it to the test.

_{0％平局容差：}

_{2％领带公差：}

_{5％领带公差：}

_{10％领带公差：}

_{25％的领带承受能力：}

为什么它会像这样？我应该相信吗？如果没有，我应该使用另一个还是这个？

首先：

第二

8 个答案: