Question

我正在运行一个长期存在的Haskell程序，该程序可以保留大量内存。使用+RTS -N5 -s -A25M（我的L3缓存的大小）运行，我看到：

715,584,711,208 bytes allocated in the heap
390,936,909,408 bytes copied during GC
  4,731,021,848 bytes maximum residency (745 sample(s))
     76,081,048 bytes maximum slop
           7146 MB total memory in use (0 MB lost due to fragmentation)

                                  Tot time (elapsed)  Avg pause  Max pause
Gen  0     24103 colls, 24103 par   240.99s   104.44s     0.0043s    0.0603s
Gen  1       745 colls,   744 par   2820.18s   619.27s     0.8312s    1.3200s

Parallel GC work balance: 50.36% (serial 0%, perfect 100%)

TASKS: 18 (1 bound, 17 peak workers (17 total), using -N5)

SPARKS: 1295 (1274 converted, 0 overflowed, 0 dud, 0 GC'd, 21 fizzled)

INIT    time    0.00s  (  0.00s elapsed)
MUT     time  475.11s  (454.19s elapsed)
GC      time  3061.18s  (723.71s elapsed)
EXIT    time    0.27s  (  0.50s elapsed)
Total   time  3536.57s  (1178.41s elapsed)

Alloc rate    1,506,148,218 bytes per MUT second

Productivity  13.4% of total user, 40.3% of total elapsed

GC时间是总运行时间的87％！我在具有大量RAM的系统上运行它，但是当我设置高-H值时，性能更差。

似乎-H和-A都控制了gen 0的大小，但我真正想做的是增加gen 1的大小。这样做的最佳方式是什么？

Answer 1

正如Carl建议的那样，您应该检查代码是否存在空间泄漏。我会假设你的程序真的需要很多内存。

该计划花了2820.18s做主要的GC。您可以通过减少内存使用量（不是假设的情况）或主要集合的数量来降低它。您有大量的可用内存，因此您可以尝试-Ffactor option：

 -Ffactor

    [Default: 2] This option controls the amount of memory reserved for
 the older generations (and in the case of a two space collector the size
 of the allocation area) as a factor of the amount of live data. For
 example, if there was 2M of live data in the oldest generation when we
 last collected it, then by default we'll wait until it grows to 4M before
 collecting it again.

在你的情况下，有~3G的实时数据。默认情况下，当堆增长到6G时，将触发主GC。使用-F3时，当堆增长到9G时将触发它，从而节省大约1000s的CPU时间。

如果大多数实时数据是静态的（例如，从不更改或变化缓慢），那么您将对stable heap感兴趣。这个想法是排除主要GC的长期生活数据。它可以实现，例如，使用compact normal forms，虽然它已进入GHC not merged。

优化Haskell GC的使用

1 个答案: