Question

我正在做一些性能指标，我遇到了一些对我来说很奇怪的事情。我计时以下两个职能：

  private static void DoOne()
      {
         List<int> A = new List<int>();
         for (int i = 0; i < 200; i++) A.Add(i);
          int s=0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < A.Count; c++) s += A[c];
         }

      }

   private static void DoTwo()
      {
         List<int> A = new List<int>();
         for (int i = 0; i < 200; i++) A.Add(i);
         IList<int> L = A;
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < L.Count; c++) s += L[c];
         }

      }

即使在发布模式下进行编译，时间结果仍然表明DoTwo比DoOne长约100倍：

 DoOne took 0.06171706 seconds.
 DoTwo took 8.841709 seconds.

鉴于List直接实现了IList，我对结果感到非常惊讶。任何人都可以澄清这种行为吗？

血腥细节

回答问题，这里是完整的代码和项目构建首选项的图像：

Dead Image Link

using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Collections;

namespace TimingTests
{
   class Program
   {
      static void Main(string[] args)
      {
         Stopwatch SW = new Stopwatch();
         SW.Start();
         DoOne();
         SW.Stop();

         Console.WriteLine(" DoOne took {0} seconds.", ((float)SW.ElapsedTicks) / Stopwatch.Frequency);
         SW.Reset();
         SW.Start();
         DoTwo();
         SW.Stop();

         Console.WriteLine(" DoTwo took {0} seconds.", ((float)SW.ElapsedTicks) / Stopwatch.Frequency);

      }

      private static void DoOne()
      {
         List<int> A = new List<int>();
         for (int i = 0; i < 200; i++) A.Add(i);
         int s=0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < A.Count; c++) s += A[c];
         }

      }
      private static void DoTwo()
      {
         List<int> A = new List<int>();
         for (int i = 0; i < 200; i++) A.Add(i);
         IList<int> L = A;
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < L.Count; c++) s += L[c];
         }

      }
   }
}

感谢所有好的答案（特别是@kentaromiura）。虽然我觉得我们仍然错过了这个难题的一个重要部分，但我会关闭这个问题。为什么通过它实现的接口访问类会慢得多？我能看到的唯一区别是通过接口访问函数意味着使用虚拟表，而通常可以直接调用函数。为了查看是否是这种情况，我对上面的代码进行了一些更改。首先，我介绍了两个几乎相同的类：

  public class VC
  {
     virtual public int f() { return 2; }
     virtual public int Count { get { return 200; } }

  }

  public class C
  {
      public int f() { return 2; }
      public int Count { get { return 200; } }

  }

正如您所看到的，VC正在使用虚拟功能而C则没有。现在来到DoOne和DoTwo：

    private static void DoOne()
      {  C a = new C();
         int s=0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < a.Count; c++) s += a.f();
         }

      }
      private static void DoTwo()
      {
           VC a = new VC();
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < a.Count; c++) s +=  a.f();
         }

      }

确实：

DoOne took 0.01287789 seconds.
DoTwo took 8.982396 seconds.

这更可怕 - 虚函数调用速度慢800倍？所以社区有几个问题：

你可以复制吗？（鉴于事实上，所有表现都更差之前，但没有我的那么糟糕）
你能解释一下吗？
（这可能是最重要的） - 你能想到吗？一种避免的方法？

波阿斯

Answer 1

给那些试图对这样的东西进行基准测试的人。

不要忘记代码在第一次运行<= strong>之前不会被执行。这意味着第一次运行方法时，运行该方法的成本可能由加载IL所花费的时间，分析IL以及将其嵌入到机器代码中的时间决定，特别是如果它是一个简单的方法。

如果你要做的是比较两种方法的“边际”运行时成本，最好同时运行它们两次，仅考虑第二次运行以进行比较。

Answer 2

一对一分析：

使用Snippet编译器进行测试。

使用您的代码结果：

0.043s vs 0.116s

消除临时L

0.043s vs 0.116s - inInfluent

通过在两个方法的cmax中缓存A.count

0.041s vs 0.076s

     IList<int> A = new List<int>();
     for (int i = 0; i < 200; i++) A.Add(i);

     int s = 0;
     for (int j = 0; j < 100000; j++)
     {
        for (int c = 0,cmax=A.Count;c< cmax;  c++) s += A[c];
     }

现在我将尝试减慢DoOne的速度，首先尝试在添加前投射到IList：

for (int i = 0; i < 200; i++) ((IList<int>)A).Add(i);

0,041s 0,076s - 所以add是ininfluent

所以它仍然只是可以发生减速的地方：s += A[c]; 所以我试试这个：

s += ((IList<int>)A)[c];

0.075s 0.075s - TADaaan！

所以似乎在接口版本上访问Count或索引元素的速度较慢：

编辑：只是为了好玩，看看这个：

 for (int c = 0,cmax=A.Count;c< cmax;  c++) s += ((List<int>)A)[c];

0.041s 0.050s

所以不是演员问题，而是反思问题！

Answer 3

首先，我要感谢所有人的回答。在确定我们正在发生的事情的路径中，这是非常重要的。特别感谢@kentaromiura，它找到了解决问题所需的关键。

使用List＆lt; T＆gt;的减速源。通过IList＆lt; T＆gt;接口是缺少JIT编译器内联Item属性get函数的能力。通过IList接口访问列表导致的虚拟表的使用可防止发生这种情况。

作为证据，我写了以下代码：

      public class VC
      {
         virtual public int f() { return 2; }
         virtual public int Count { get { return 200; } }

      }

      public class C
      {
         //[MethodImpl( MethodImplOptions.NoInlining)]
          public int f() { return 2; }
          public int Count 
          {
            // [MethodImpl(MethodImplOptions.NoInlining)] 
            get { return 200; } 
          }

      }

并将DoOne和DoTwo类修改为以下内容：

      private static void DoOne()
      {
         C c = new C();
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int i = 0; i < c.Count; i++) s += c.f();
         }

      }
      private static void DoTwo()
      {
         VC c = new VC();
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int i = 0; i < c.Count; i++) s += c.f();
         }

      }

现在功能时间与以前非常相似：

 DoOne took 0.01273598 seconds.
 DoTwo took 8.524558 seconds.

现在，如果删除C类中MethodImpl之前的注释（强制JIT不要内联） - 时间变为：

DoOne took 8.734635 seconds.
DoTwo took 8.887354 seconds.

Voila - 这些方法几乎同时进行。您可以看到DoOne仍然稍微快一点，这与虚拟函数的额外开销是一致的。

Answer 4

我认为问题在于你的时间指标，你用什么来衡量经过的时间？

仅供记录，以下是我的结果：

DoOne() -> 295 ms
DoTwo() -> 291 ms

代码：

        Stopwatch sw = new Stopwatch();

        sw.Start();
        {
            DoOne();
        }
        sw.Stop();

        Console.WriteLine("DoOne() -> {0} ms", sw.ElapsedMilliseconds);

        sw.Reset();

        sw.Start();
        {
            DoTwo();
        }
        sw.Stop();

        Console.WriteLine("DoTwo() -> {0} ms", sw.ElapsedMilliseconds);

Answer 5

我看到界面版本有一些重大的惩罚，但是你看到的幅度惩罚还远远不够。

您是否可以发布一个小型，完整的程序来演示行为，以及您正在编译它的确切方式以及您正在使用的框架的确切版本？

Answer 6

我的测试表明，在发布模式下编译时，接口版本速度会慢约3倍。在调试模式下编译时，它们几乎是并驾齐驱。

--------------------------------------------------------
 DoOne Release (ms) |  92 |  91 |  91 |  92 |  92 |  92
 DoTwo Release (ms) | 313 | 313 | 316 | 352 | 320 | 318
--------------------------------------------------------
 DoOne Debug (ms)   | 535 | 534 | 548 | 536 | 534 | 537
 DoTwo Debug (ms)   | 566 | 570 | 569 | 565 | 568 | 571
--------------------------------------------------------

修改

在我的测试中，我使用了DoTwo方法的略微修改版本，因此它可以直接与DoOne进行比较。（这种变化对表现没有任何明显的差异。）

private static void DoTwo() { IList<int> A = new List<int>(); for (int i = 0; i < 200; i++) A.Add(i); int s = 0; for (int j = 0; j < 100000; j++) { for (int c = 0; c < A.Count; c++) s += A[c]; } }

为DoOne和（已修改）DoTwo生成的IL之间的唯一区别是callvirt，Add和{{1}的get_Item指令使用get_Count和IList而不是ICollection本身。

我猜测运行时必须做更多工作才能在List通过接口时找到实际的方法实现（并且JIT编译器/优化器可以使用非接口做得更好在发布模式下进行编译时，调用比调用接口调用。）

有人可以证实吗？

Answer 7

我使用Jon Skeet's Benchmark Helper运行此操作并且我没有看到您的结果，两种方法的执行时间大致相同。

为什么将List <t>转换为IList <t>会导致性能下降？</t> </t>

血腥细节

7 个答案: