编译器优化(如ghc -O2)是否可以更改程序的顺序(时间或存储)?

时间:2011-10-03 13:42:09

标签: haskell ghc compiler-optimization

我觉得答案是肯定的,而且不仅限于Haskell。例如,尾调用优化将内存需求从O(n)改为O(l),对吧?

我确切关注的是:在Haskell上下文中,在推理程序的性能和大小时,有什么期望理解编译器优化?

在Scheme中,您可以将一些优化视为理所当然,例如TCO,因为您使用的是符合规范的解释器/编译器。

1 个答案:

答案 0 :(得分:15)

是的,特别是GHC执行严格性分析,这可以大大减少程序的空间使用,从 O(n) O的意外懒惰(1 )

例如,考虑一下这个简单的程序:

$ cat LazySum.hs
main = print $ sum [1..100000]

由于sum不认为加法运算符是严格的,(它可能与Num实例一起使用(+)是惰性的),它会导致大量的要分配的thunk。如果未启用优化,则不会执行严格性分析。

$ ghc --make LazySum.hs -rtsopts -fforce-recomp
[1 of 1] Compiling Main             ( LazySum.hs, LazySum.o )
Linking LazySum ...
$ ./LazySum +RTS -s
./LazySum +RTS -s 
5000050000
      22,047,576 bytes allocated in the heap
      18,365,440 bytes copied during GC
       6,348,584 bytes maximum residency (4 sample(s))
       3,133,528 bytes maximum slop
              15 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    23 collections,     0 parallel,  0.04s,  0.03s elapsed
  Generation 1:     4 collections,     0 parallel,  0.01s,  0.02s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.01s  (  0.03s elapsed)
  GC    time    0.05s  (  0.04s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.06s  (  0.07s elapsed)

  %GC time      83.3%  (58.0% elapsed)

  Alloc rate    2,204,757,600 bytes per MUT second

  Productivity  16.7% of total user, 13.7% of total elapsed

但是,如果我们在启用优化的情况下进行编译,那么严格性分析器将确定由于我们使用Integer的加法运算符(已知是严格的),编译器知道评估它是安全的。提前的thunks,所以程序在恒定的空间运行。

$ ghc --make -O2 LazySum.hs -rtsopts -fforce-recomp
[1 of 1] Compiling Main             ( LazySum.hs, LazySum.o )
Linking LazySum ...
$ ./LazySum +RTS -s
./LazySum +RTS -s 
5000050000
       9,702,512 bytes allocated in the heap
           8,112 bytes copied during GC
          27,792 bytes maximum residency (1 sample(s))
          20,320 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    18 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.01s  (  0.02s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.01s  (  0.02s elapsed)

  %GC time       0.0%  (2.9% elapsed)

  Alloc rate    970,251,200 bytes per MUT second

  Productivity 100.0% of total user, 55.0% of total elapsed

请注意,如果我们自己添加严格性,即使没有优化,我们也可以获得恒定的空间:

$ cat StrictSum.hs 
import Data.List (foldl')
main = print $ foldl' (+) 0 [1..100000]
$ ghc --make StrictSum.hs -rtsopts -fforce-recomp
[1 of 1] Compiling Main             ( StrictSum.hs, StrictSum.o )
Linking StrictSum ...
$ ./StrictSum +RTS -s
./StrictSum +RTS -s 
5000050000
       9,702,664 bytes allocated in the heap
           8,144 bytes copied during GC
          27,808 bytes maximum residency (1 sample(s))
          20,304 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    18 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.00s  (  0.01s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.00s  (  0.01s elapsed)

  %GC time       0.0%  (2.1% elapsed)

  Alloc rate    9,702,664,000,000 bytes per MUT second

  Productivity 100.0% of total user, 0.0% of total elapsed

由于Haskell的评估模型,严格性往往比尾部调用({3}}更大。