为什么数组的序列表达式在F#中如此之慢?

时间:2015-06-05 17:23:52

标签: .net f#

代码:

#time "on"

let newVector = [| 
  for v in 1..10000000 ->
    v |] 

let newVector2 = 
  let a = Array.zeroCreate 10000000
  for v in 1..10000000 do
    a.[v-1] <- v
  a

let newVector3 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in 1..10000000 do
    a.Add(v)
  a.ToArray()

给出了FSI的时间:

--> Timing now on

> 
Real: 00:00:01.121, CPU: 00:00:01.156, GC gen0: 4, gen1: 4, gen2: 4

val newVector : int [] =  [|1; 2; 3; 4; ...|]

> 
Real: 00:00:00.024, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0

val newVector2 : int32 [] =  [|1; 2; 3; 4; ...|]

> 
Real: 00:00:00.173, CPU: 00:00:00.156, GC gen0: 2, gen1: 2, gen2: 2

val newVector3 : int32 [] =  [|1; 2; 3; 4;  ...|]

独立应用程序,发布模式,没有调试,平均5次运行没有太大区别:

  • 序列表达:425
  • Preallocate:13
  • 清单:80

第三种方法不知道原始容量,但仍然快了近7倍。为什么数组的序列表达式在F#中是如此之慢?

更新

let seq = seq { for v in 1..10000000 do yield v }
let seqArr = seq |> Seq.toArray
Real: 00:00:01.060, CPU: 00:00:01.078, GC gen0: 2, gen1: 2, gen2: 2

let newVector4 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in seq do
    a.Add(v)
  a.ToArray()
Real: 00:00:01.119, CPU: 00:00:01.109, GC gen0: 1, gen1: 1, gen2: 1

open System.Linq
let newVector5 =  seq.ToArray()
Real: 00:00:00.969, CPU: 00:00:00.968, GC gen0: 0, gen1: 0, gen2: 0

这给出了与第一个相同的时间,并且不依赖于GC。那么真正的问题是,为什么枚举1..10000000比第二种和第三种情况中的for循环慢得多?

更新2

open System
open System.Linq
open System.Collections.Generic
let newVector5 =  seq.ToArray()

let ie count = 
  { new IEnumerable<int> with
      member x.GetEnumerator(): Collections.IEnumerator = x.GetEnumerator() :> Collections.IEnumerator

      member x.GetEnumerator(): IEnumerator<int> = 
        let c = ref 0
        { new IEnumerator<int> with
            member y.MoveNext() = 
              if !c < count then
                c := !c + 1
                true
              else false

            member y.Current with get() = !c + 1
            member y.Current with get() = !c + 1 :> obj
            member y.Dispose() = () 
            member y.Reset() = ()       
        }
  }


let newVector6 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in ie 10000000 do
    a.Add(v)
  a.ToArray()
Real: 00:00:00.185, CPU: 00:00:00.187, GC gen0: 1, gen1: 1, gen2: 1

IEnumerable的手动实现相当于for循环。我想知道为什么lo..hi的扩展对于一般情况应该慢得多。它可以通过方法重载来实现,至少对于最常见的类型。

2 个答案:

答案 0 :(得分:8)

在这种情况下,我总是使用.NET中许多优秀的反编译器之一检查生成的代码。

let explicitArray () = 
  let a = Array.zeroCreate count
  for v in 1..count do
    a.[v-1] <- v
  a

这被编译成等效的C#:

public static int[] explicitArray()
{
    int[] a = ArrayModule.ZeroCreate<int>(10000000);
    for (int v = 1; v < 10000001; v++)
    {
        a[v - 1] = v;
    }
    return a;
}

它的效率非常高。

let arrayExpression () = 
  [|  for v in 1..count -> v |] 

另一方面,这变成:

public static int[] arrayExpression()
{
    return SeqModule.ToArray<int>(new Program.arrayExpression@7(0, null, 0, 0));
}

这相当于:

let arrayExpression () = 
  let e = seq { for v in 1..count -> v }
  let a = List() // do not set capacity
  for v in e do
    a.Add(v)
  a.ToArray()

当迭代seqIEnumerable的别名)时,先调用MoveNext,然后Current。这些是JIT:er很少可以内联的虚拟调用。检查JIT:ed汇编代码,我们看到:

mov         rax,qword ptr [rbp+10h]  
cmp         byte ptr [rax],0  
mov         rcx,qword ptr [rbp+10h]  
lea         r11,[7FFC07830030h]  
# virtual call .MoveNext
call        qword ptr [7FFC07830030h]  
movzx       ecx,al  
# if .MoveNext returns false then exit
test        ecx,ecx  
je          00007FFC079408A0  
mov         rcx,qword ptr [rbp+10h]  
lea         r11,[7FFC07830038h]  
# virtual call .Current
call        qword ptr [7FFC07830038h]  
mov         edx,eax  
mov         rcx,rdi  
# call .Add
call        00007FFC65C8B300  
# loop
jmp         00007FFC07940863  

如果我们将其与使用ResizeArrayList)的代码的JIT:ed代码进行比较

lea         edx,[rdi-1]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
mov         edx,edi  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
lea         edx,[rdi+1]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
lea         edx,[rdi+2]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
add         edi,4  
# loop
cmp         edi,989682h  
jl          00007FFC07910384  

这里JIT:呃已经在这里展开了4次循环,我们只有List.Add的非虚拟调用。

这解释了为什么 F#数组表达式比其他两个例子慢。

为了解决这个问题,我必须修复F#中的optimizer以识别表达式的形状,例如:

seq { for v in 1..count -> v } |> Seq.toArray

并优化它们:

let a = Array.zeroCreate count
for v in 1..count do
    a.[v-1] <- v
a

挑战在于找到一个足够通用的优化,但也不会破坏F#的语义。

答案 1 :(得分:4)

对于笑脸,我将你的数组理解倾注到ILDASM中,看看会发生什么。我将let放入main并得到了这个:

.locals init ([0] int32[] newVector,
       [1] class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string[],class [FSharp.Core]Microsoft.FSharp.Core.Unit> V_1,
       [2] string[] V_2)
IL_0000:  nop
IL_0001:  ldc.i4.0
IL_0002:  ldnull
IL_0003:  ldc.i4.0
IL_0004:  ldc.i4.0
IL_0005:  newobj     instance void Program/newVector@11::.ctor(int32,
                                                             class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>,
                                                             int32,
                                                             int32)
IL_000a:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.SeqModule::ToArray<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_000f:  stloc.0

因此创建了newVector @ 11的实例,并且该类继承自GeneratedSequenceBase,后者本身实现了IENumerable。这是有道理的,因为接下来有一个Seq.ToArray的调用。查看that class,由于IEnumerable的性质,无法确定序列的长度是否已知,即使在这种情况下可知。这告诉我它应该等同于:

let seqArr = seq { for v in 1..10000000 do yield v } |> Seq.toArray

时间表明了这一点:

Real: 00:00:01.032, CPU: 00:00:01.060, GC gen0: 4, gen1: 4, gen2: 4

为了最后一点乐趣,我把上面的序列理解通过ILDASM,其中的包装类和GenerateNext方法是与数组理解相同的指令指令。因此,我认为非常安全地得出结论表明任何数组理解形式:

let arr = [| sequence-expr |]

100%相当于:

let arr = seq { sequence-expr } |> Seq.toArray

这在F# array documentation中暗示如下:

  

您还可以使用序列表达式来创建数组。以下是一个创建1到10整数的正方形数组的示例。

这真的在说&#34;它是一个序列,而不是它自己的东西。&#34;