为什么这里的早期条款不是垃圾收集?

时间:2015-05-05 03:05:40

标签: haskell memory garbage-collection ghc compiler-optimization

如果我将Kolakoski Sequence定义为

kolakoski () !! 500000000

并找到第500,000,000个术语

while index>len(numbers):

我发现当使用ghc -O编译时,这会很快消耗大量内存。但是,如果关闭优化,它几乎不会使用任何东西。哪种优化导致了这种情况,如何将其关闭?

1 个答案:

答案 0 :(得分:6)

让我们比较实际数字。如果在没有优化的情况下运行,您的kolakoski版本将使用大约70k:

$ ghc --make Kolakoski-Unit && ./Kolakoski-Unit +RTS -s
2
 288,002,359,096 bytes allocated in the heap
   1,343,933,816 bytes copied during GC
          67,576 bytes maximum residency (422 sample(s))
          52,128 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     551615 colls,     0 par    1.89s    2.30s     0.0000s    0.0001s
  Gen  1       422 colls,     0 par    0.02s    0.02s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   37.34s  ( 37.25s elapsed)
  GC      time    1.91s  (  2.33s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   39.25s  ( 39.58s elapsed)

  %GC     time       4.9%  (5.9% elapsed)

  Alloc rate    7,712,197,063 bytes per MUT second

  Productivity  95.1% of total user, 94.3% of total elapsed

通过优化,它使用大约4GB(尽管任务管理器中的实际内存使用量高达~6GB)。

$ ghc --make Kolakoski-Unit -O && ./Kolakoski-Unit +RTS -s
2
  64,000,066,608 bytes allocated in the heap
  27,971,527,816 bytes copied during GC
   3,899,031,480 bytes maximum residency (34 sample(s))
      91,679,728 bytes maximum slop
            9549 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     122806 colls,     0 par    8.67s    9.48s     0.0001s    0.0148s
  Gen  1        34 colls,     0 par   11.55s   69.78s     2.0524s    56.2970s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    8.80s  (  8.43s elapsed)
  GC      time   20.22s  ( 79.26s elapsed)
  EXIT    time    0.03s  (  0.89s elapsed)
  Total   time   29.05s  ( 88.58s elapsed)

  %GC     time      69.6%  (89.5% elapsed)

  Alloc rate    7,275,318,406 bytes per MUT second

  Productivity  30.4% of total user, 10.0% of total elapsed

如果我们使用基于列表的版本而不使用优化,则内存消耗与启用了优化的内存消耗非常相似:

kolakoskiList :: [Int]
kolakoskiList = 1 : 2 : helper 
  where
    helper = 2 : concat (zipWith replicate helper (cycle [1, 2]))
$ ghc --make Kolakoski-List && ./Kolakoski-List +RTS -s
2
  96,000,143,328 bytes allocated in the heap
  26,615,974,536 bytes copied during GC
   3,565,429,808 bytes maximum residency (34 sample(s))
      83,610,688 bytes maximum slop
            8732 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     184252 colls,     0 par    9.98s   10.16s     0.0001s    0.0006s
  Gen  1        34 colls,     0 par   10.45s   21.61s     0.6357s    12.0792s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   12.02s  ( 11.88s elapsed)
  GC      time   20.44s  ( 31.77s elapsed)
  EXIT    time    0.05s  (  0.69s elapsed)
  Total   time   32.50s  ( 44.34s elapsed)

  %GC     time      62.9%  (71.7% elapsed)

  Alloc rate    7,989,608,807 bytes per MUT second

  Productivity  37.1% of total user, 27.2% of total elapsed

现在,我们可以检查GHC flag reference以查找在-O上自动设置的标记。由于列表版本似乎与优化版本相同,因此人们可能会认为GHC将kolakoski转换为kolakoskiList

kolakoskiOptimized _ = kolakoskiList

让我们通过-ddump-simpl -dsupress-all

核实这一点
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 45, types: 30, coercions: 0}

kolakoski
kolakoski =
  \ ds_d10r ->
    case ds_d10r of _ { () ->
    : (I# 1)
      (: (I# 2)
         (letrec {
            helper_aNo
            helper_aNo =
              \ ds1_d10s ->
                case ds1_d10s of _ { () ->
                : (I# 2)
                  (concat
                     (zipWith
                        (replicate) (helper_aNo ()) (cycle (: (I# 1) (: (I# 2) ([]))))))
                }; } in
          helper_aNo ()))
    }

main
main = print $fShowInt (!! (kolakoski ()) (I# 500000000))

main
main = runMainIO main

即使您不熟悉GHC的核心,您也可以看到kolakoski与原始版本大致相同。现在将其与-O -ddump-simpl -dsupress-all

进行比较
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 125, types: 102, coercions: 9}

kolakoski6
kolakoski6 = I# 1

kolakoski5
kolakoski5 = I# 2

Rec {
go_r1NG
go_r1NG =
  \ ds_a14B _ys_a14C ->
    case ds_a14B of _ {
      [] -> [];
      : ipv_a14H ipv1_a14I ->
        case _ys_a14C of _ {
          [] -> [];
          : ipv2_a14O ipv3_a14P ->
            case ipv_a14H of _ { I# n#_a13J ->
            case tagToEnum# (<=# n#_a13J 0) of _ {
              False ->
                let {
                  lvl2_s1N3
                  lvl2_s1N3 = : ipv2_a14O ([]) } in
                letrec {
                  xs_a1LH
                  xs_a1LH =
                    \ m_a1LO ->
                      case tagToEnum# (<=# m_a1LO 1) of _ {
                        False -> : ipv2_a14O (xs_a1LH (-# m_a1LO 1));
                        True -> lvl2_s1N3
                      }; } in
                ++ (xs_a1LH n#_a13J) (go_r1NG ipv1_a14I ipv3_a14P);
              True -> ++ ([]) (go_r1NG ipv1_a14I ipv3_a14P)
            }
            }
        }
    }
end Rec }

lvl_r1NH
lvl_r1NH = : kolakoski5 ([])

lvl1_r1NI
lvl1_r1NI = : kolakoski6 lvl_r1NH

Rec {
xs'_r1NJ
xs'_r1NJ = ++ lvl1_r1NI xs'_r1NJ
end Rec }

Rec {
kolakoski3
kolakoski3 = : kolakoski5 kolakoski4

kolakoski4
kolakoski4 = go_r1NG kolakoski3 xs'_r1NJ
end Rec }

kolakoski2
kolakoski2 = : kolakoski5 kolakoski3

kolakoski1
kolakoski1 = : kolakoski6 kolakoski2

kolakoski
kolakoski = \ ds_d13p -> case ds_d13p of _ { () -> kolakoski1 }

此版本包含几个顶级CAFs,它们在程序的生命周期内保留。因此,您确实生成了最多500,000,000个值的列表并保存

现在,那里发生了什么?函数内部的东西向外浮动。让我们再次检查标志参考。这是一个有希望的旗帜,由-O暗示:

  

-ffull-laziness打开完全懒惰(向外浮动绑定)。 隐含-O

这就是导致你出现问题的旗帜。实际上,您可以使用ghc --make -O -fno-full-laziness Kolakoski-Unit.hs来获取原始内存消耗:

$ ghc --make Kolakoski-Unit.hs -O -fno-full-laziness && ./Kolakoski-Unit +RTS -s
2
 192,001,417,688 bytes allocated in the heap
     637,962,464 bytes copied during GC
          66,104 bytes maximum residency (151 sample(s))
          43,448 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     368364 colls,     0 par    1.34s    1.32s     0.0000s    0.0002s
  Gen  1       151 colls,     0 par    0.00s    0.01s     0.0001s    0.0003s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   27.89s  ( 28.13s elapsed)
  GC      time    1.34s  (  1.33s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   29.25s  ( 29.46s elapsed)

  %GC     time       4.6%  (4.5% elapsed)

  Alloc rate    6,884,084,443 bytes per MUT second

  Productivity  95.4% of total user, 94.7% of total elapsed

相关问题