为什么在LINQ中使用对象属性比使用原语要慢?

时间:2010-09-17 19:10:33

标签: c# linq performance

为什么这个LINQ查询(Id是Structure对象中long类型的属性):

IList<Structure> theStructures = new List<Structure>();
public int GetChildrenSlow(Structure aStructure){
   IEnumerable<Structure> childrenQuery =
                         from structure in theStructures
                         where structure.ParentStructureId == aStructure.Id
                         select structure;
   int count = childrenQuery.Count();
   //Functionality continues...
}

比这个跑得慢:

IList<Structure> theStructures = new List<Structure>();
public int GetChildrenFast(long aStructureId){
   IEnumerable<Structure> childrenQuery =
                         from structure in theStructures
                         where structure.ParentStructureId == aStructureId
                         select structure;
   int count = childrenQuery.Count();
   //Functionality continues...
}

我正在拨打这个电话数千次(递归)并且使用该属性比直接使用long要慢得多。如果我将Id拉出并将其存储在之前的长变量中,我执行LINQ命令,速度几乎等于GetChildrenFast的速度。为什么在LINQ中使用对象属性比使用原语慢?

工作示例:

namespace ConsoleApplication1
{
   class Structure
   {
      public int Id
      {
         get; set;
      }

      public int ParentStructureId
      {
         get; set;
      }
   }

   class Program
   {
      private IList<Structure> theStructures = new List<Structure>();
      public Structure FirstStructure
      {
         get; set;
      }

      private int FastCountStructureChildren(long aStructureId)
      {
         IEnumerable<Structure> childrenQuery =
                         from structure in theStructures
                         where structure.ParentStructureId == aStructureId
                         select structure;

         int count = childrenQuery.Count();
         foreach(Structure childStructure in childrenQuery)
         {
            count += FastCountStructureChildren(childStructure.Id);
         }
         return count;
      }

      private int SlowCountStructureChildren(Structure aStructure)
      {
         IEnumerable<Structure> childrenQuery =
                         from structure in theStructures
                         where structure.ParentStructureId == aStructure.Id
                         select structure;

         int count = childrenQuery.Count();
         foreach(Structure childStructure in childrenQuery)
         {
            count += SlowCountStructureChildren(childStructure);
         }
         return count;
      }

      public void BuildStructure()
      {
         FirstStructure = new Structure{Id = 0, ParentStructureId = -1};
         theStructures.Add(FirstStructure);
         //The loop only goes to 6000 as any more than that causes
         //a StackOverflowException my development machine.
         for(int i=1; i<6000; i++)
         {
            Structure newStructure = new Structure{Id = i,ParentStructureId = i - 1};
            theStructures.Add(newStructure);
         }
      }

      static void Main(string[] args)
      {
         Program program = new Program();
         program.BuildStructure();

         Stopwatch fastStopwatch = new Stopwatch();
         fastStopwatch.Start();
         program.FastCountStructureChildren(0);
         fastStopwatch.Stop();

         Stopwatch slowStopwatch = new Stopwatch();
         slowStopwatch.Start();
         program.SlowCountStructureChildren(program.FirstStructure);
         slowStopwatch.Stop();

         Console.WriteLine("Fast time: " + fastStopwatch.Elapsed);
         Console.WriteLine("Slow time: " + slowStopwatch.Elapsed);
         Console.ReadLine();
      }
   }
}

5 个答案:

答案 0 :(得分:2)

按照提供的方式运行完整示例

Fast time: 00:00:01.6187793
Slow time: 00:00:01.3977344

只有我在调试模式下运行才会慢慢实现慢速运行。这是因为在调试模式下,方法永远不会内联,并且在任何地方都会散落着NOP以允许您破坏,例如在Id getter里面。


由于你显然关心运行速度,我会指出一个无关的低效率:你运行查询两次:一次用于计数,一次用于迭代子项。仅运行一次(并在循环中将计数增加1)应该可以加快速度。


顺便说一下,我通常解决这个问题的方法是,如果用{id}直接调用GetChildren方法是有意义的,提供两个重载。否则,提供一个(Structure)重载并在查询之前获取id,如long id = aStructure.id;

答案 1 :(得分:1)

好吧,即使属性访问是内联的,我仍然需要对每次迭代进行无效检查,我怀疑。这是一个额外的条件,例如可能会搞砸分支预测。

玩一个完整的例子会很有趣,但我怀疑这只是你在每个委托调用上执行额外操作的事实。也有可能“额外的一点点”已经关闭了与委托相关的其他一些内联,导致了一种多米诺骨牌性能效应。

答案 2 :(得分:0)

Long是一个结构体,它具有与对象不同的构造和内存占用,这显然更慢,我相信更大。

答案 3 :(得分:0)

在“功能继续”中,您是否再次使用childQuery?你是否意识到每次重新列举结构?不要多次枚举大型数据集,并且每个项目的属性访问成本不会太差。

IList<Structure> theStructures = new List<Structure>(); 
ILookup<int, Structure> byParentId = null;

public int GetChildren(Structure aStructure){
   if (byParentId = null)
   {
     byParentId = theStructures.ToLookup(x => x.ParentStructureId);
   }
   List<Structure> children = byParentId[aStructure.Id].ToList();
   int count = children.Count; 
   //Functionality continues... 
} 

答案 4 :(得分:0)

由于C#中允许出现副作用,因此无法轻易地将属性值静态地确定为可安全缓存。例如,假设这是您的代码:

public IEnumerable<Structure> FetchChildren()
{
    for (int i = 0; i < 10; i++)
    {
        aStructure.Id++;
        yield return GetChild(a.Structure.Id);
    }
}

public int GetChildrenSlow(Structure aStructure){
   IEnumerable<Structure> childrenQuery =
                         from structure in FetchChildren()
                         where structure.ParentStructureId == aStructure.Id
                         select structure;
   int count = childrenQuery.Count();
   //Functionality continues...
}

如您所见,aStructure.Id会在您枚举时发生变化。是的,在您的情况下,您的枚举代码都没有副作用,但C#并不够聪明,无法知道。此外,不仅枚举可能会产生副作用。例如:

IList<Structure> theStructures = new List<Structure>();
public int GetChildrenSlow(Structure aStructure){
   IEnumerable<Structure> childrenQuery =
        theStructures.Where(s => s.ParentStructureId == aStructure.Id++);
   int count = childrenQuery.Count();
   //Functionality continues...
}

总是有多线程可以搞砸了。由于存在突变的可能性,您需要检查属性值的命中是必要的。