Question

我今天在我的项目中看到了一个linq查询语法，它以这种方式从特定条件的List项计数：

int temp =  (from A in pTasks 
             where A.StatusID == (int)BusinessRule.TaskStatus.Pending     
             select A).ToList().Count();

我想通过编写它来重构它，就像使用Count()来提高可读性一样，我认为它的性能也会很好，所以我写道：

int UnassignedCount = pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);

但是当我通过放置StopWatch检查时，lambda表达式所经历的时间总是超过查询synax：

Stopwatch s = new Stopwatch();
s.Start();
int UnassignedCount = pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);
s.Stop();
Stopwatch s2 = new Stopwatch();
s2.Start();
int temp =  (from A in pTasks 
             where A.StatusID == (int)BusinessRule.TaskStatus.Pending
             select A).ToList().Count();
s2.Stop();

有人可以解释为什么会这样吗？

Answer 1

我已经模拟了你的情况。是的，这些查询的执行时间之间存在差异。但是，这种差异的原因不是查询的语法。如果您使用了方法或查询语法，则无关紧要。两者都产生相同的结果，因为 查询表达式在编译之前被转换为它们的lambda表达式 。

但是，如果您已经注意到这两个查询完全不相同。您的第二个查询将在编译之前转换为它的lambda语法（您可以删除 ToList() 来自查询，因为它是多余的）：

pTasks.Where(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending).Count();

现在我们在lambda语法中有两个Linq查询。我上面说过的那个和这个：

pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);

现在，问题是：
为什么这两个查询的执行时间有差异？

让我们找到答案：
我们可以通过回顾这些来理解这种差异的原因：
- .Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate).Count(this IEnumerable<TSource> source)
和
- Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate);

以下是Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate)的实施：

public static int Count<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null) throw Error.ArgumentNull("source");
    if (predicate == null) throw Error.ArgumentNull("predicate");
    int count = 0;
    foreach (TSource element in source) {
        checked {
            if (predicate(element)) count++;
        }
    }
    return count;
}

这是Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate)：

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null) 
        throw Error.ArgumentNull("source");
    if (predicate == null) 
        throw Error.ArgumentNull("predicate");
    if (source is Iterator<TSource>) 
        return ((Iterator<TSource>)source).Where(predicate);
    if (source is TSource[]) 
        return new WhereArrayIterator<TSource>((TSource[])source, predicate);
    if (source is List<TSource>) 
        return new WhereListIterator<TSource>((List<TSource>)source, predicate);
    return new WhereEnumerableIterator<TSource>(source, predicate);
}

让我们关注Where()实施。如果您的集合是List，它将返回WhereListIterator()，但Count()只会迭代源代码。在我看来，他们在WhereListIterator的{{3}}中加了一些加速。在此之后，我们调用Count()方法，该方法不接受任何谓词作为输入，只会迭代过滤后的集合。

关于WhereListIterator：

的实施速度

我在SO中发现了implementation个问题： this 。你可以在那里阅读LINQ performance Count vs Where and Count。他解释了这两个查询之间的性能差异。结果是： Where迭代器避免间接虚拟表调用，但直接调用迭代器方法。 正如您所看到的那样，将发出call指令，而不是callvirt。并且callvirt比call慢

从书籍CLR via C#：

当callvirt IL指令用于调用虚拟实例时方法，CLR发现正在使用的对象的实际类型进行调用，然后以多态方式调用该方法。为了确定类型，用于进行调用的变量不得是空的。换句话说，在编译此调用时，JIT编译器生成验证变量值不为空的代码。如果它为null，callvirt指令导致CLR抛出一个 NullReferenceException异常。 此附加检查表示callvirt IL指令的执行速度比调用稍慢指令。

Answer 2

像Farhad所说，Where（x）.Count（）和Count（x）的实现各不相同。第一个实例化一个额外的迭代器，在我的电脑上花费大约30.000个刻度（无论集合大小）

此外，ToList不是免费的。它分配内存。这需要时间。在我的电脑上，它大约是执行时间的两倍。（所以线性相关的收集大小）

此外，调试需要启动时间。因此，一次性精确测量性能很困难。我建议像这个例子一样循环。然后，忽略第一组结果。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var pTasks = Task.GetTasks();
            for (int i = 0; i < 5; i++)
            {

                var s1 = Stopwatch.StartNew();
                var count1 = pTasks.Count(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending);
                s1.Stop();
                Console.WriteLine(s1.ElapsedTicks);

                var s2 = Stopwatch.StartNew();
                var count2 =
                    (
                        from A in pTasks
                        where A.StatusID == (int) BusinessRule.TaskStatus.Pending
                        select A
                        ).ToList().Count();
                s2.Stop();
                Console.WriteLine(s2.ElapsedTicks);

                var s3 = Stopwatch.StartNew();
                var count3 = pTasks.Where(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending).Count();
                s3.Stop();
                Console.WriteLine(s3.ElapsedTicks);


                var s4 = Stopwatch.StartNew();
                var count4 =
                    (
                        from A in pTasks
                        where A.StatusID == (int) BusinessRule.TaskStatus.Pending
                        select A
                        ).Count();
                s4.Stop();
                Console.WriteLine(s4.ElapsedTicks);

                var s5 = Stopwatch.StartNew();
                var count5 = pTasks.Count(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending);
                s5.Stop();
                Console.WriteLine(s5.ElapsedTicks);
                Console.WriteLine();
            }
            Console.ReadLine();
        }
    }

    public class Task
    {
        public static IEnumerable<Task> GetTasks()
        {
            for (int i = 0; i < 10000000; i++)
            {
                yield return new Task { StatusID = i % 3 };
            }
        }

        public int StatusID { get; set; }
    }

    public class BusinessRule
    {
        public enum TaskStatus
        {
            Pending,
            Other
        }
    }
}

Linq Lambda与查询语法性能

2 个答案: