Question

我对Haskell很新，我想创建一个直方图。我正在使用Data.Vector.Unboxed融合数据操作;这是快速的（当使用-O -fllvm编译时）和瓶颈是我的折叠应用程序;汇总了桶数。

如何让它更快？我读过关于通过严格控制来减少thunk数量的问题，所以我通过使用seq和foldr来解决问题。但没有看到太多的性能提升。强烈鼓励你的想法。

import qualified Data.Vector.Unboxed as V

histogram :: [(Int,Int)]
histogram = V.foldr' agg [] $ V.zip k v
 where 
    n = 10000000
    c = 1000000
    k = V.generate n (\i -> i `div` c * c)
    v = V.generate n (\i -> 1)
    agg kv [] = [kv]
    agg kv@(k,v) acc@((ck,cv):as)
        | k == ck = let a = (ck,cv+v):as in a `seq` a
        | otherwise = let a = kv:acc in a `seq` a

main :: IO ()
main = print histogram

编译：

ghc --make -O -fllvm histogram.hs

Answer 1

首先，使用-O2 -rtsopts编译程序。然后，为了获得可以优化的第一个想法，使用选项+RTS -sstderr运行程序：

$ ./question +RTS -sstderr
[(0,1000000),(1000000,1000000),(2000000,1000000),(3000000,1000000),(4000000,1000000),(5000000,1000000),(6000000,1000000),(7000000,1000000),(8000000,1000000),(9000000,1000000)]
   1,193,907,224 bytes allocated in the heap
   1,078,027,784 bytes copied during GC
     282,023,968 bytes maximum residency (7 sample(s))
      86,755,184 bytes maximum slop
             763 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1964 colls,     0 par    3.99s    4.05s     0.0021s    0.0116s
  Gen  1         7 colls,     0 par    1.60s    1.68s     0.2399s    0.6665s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.67s  (  2.68s elapsed)
  GC      time    5.59s  (  5.73s elapsed)
  EXIT    time    0.02s  (  0.03s elapsed)
  Total   time    8.29s  (  8.43s elapsed)

  %GC     time      67.4%  (67.9% elapsed)

  Alloc rate    446,869,876 bytes per MUT second

  Productivity  32.6% of total user, 32.0% of total elapsed

请注意， 67％的时间用于GC！显然有些不对劲。为了找出问题所在，我们可以在启用堆分析的情况下运行程序（使用+RTS -h），这会产生下图：

First heap profile

所以，你正在泄漏thunk。这是怎么发生的？查看代码，在agg中建立（递归）thunk的唯一时间是进行添加时。通过添加爆炸模式使cv严格，从而解决了问题：

{-# LANGUAGE BangPatterns #-}
import qualified Data.Vector.Unboxed as V

histogram :: [(Int,Int)]
histogram = V.foldr' agg [] $ V.zip k v
 where
    n = 10000000
    c = 1000000
    k = V.generate n (\i -> i `div` c * c)
    v = V.generate n id
    agg kv [] = [kv]
    agg kv@(k,v) acc@((ck,!cv):as) -- Note the !
        | k == ck = (ck,cv+v):as
        | otherwise = kv:acc

main :: IO ()
main = print histogram

输出：

$ time ./improved +RTS -sstderr 
[(0,499999500000),(1000000,1499999500000),(2000000,2499999500000),(3000000,3499999500000),(4000000,4499999500000),(5000000,5499999500000),(6000000,6499999500000),(7000000,7499999500000),(8000000,8499999500000),(9000000,9499999500000)]
     672,063,056 bytes allocated in the heap
          94,664 bytes copied during GC
     160,028,816 bytes maximum residency (2 sample(s))
       1,464,176 bytes maximum slop
             155 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       992 colls,     0 par    0.03s    0.03s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.03s    0.03s     0.0161s    0.0319s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.24s  (  1.25s elapsed)
  GC      time    0.06s  (  0.06s elapsed)
  EXIT    time    0.03s  (  0.03s elapsed)
  Total   time    1.34s  (  1.34s elapsed)

  %GC     time       4.4%  (4.5% elapsed)

  Alloc rate    540,674,868 bytes per MUT second

  Productivity  95.5% of total user, 95.1% of total elapsed

./improved +RTS -sstderr  1,14s user 0,20s system 99% cpu 1,352 total

这要好得多。

现在您可以问，即使您使用seq，为什么会出现问题？原因是seq只强制第一个参数为WHNF，对于一对，(_,_)（其中_是未评估的thunk）已经是WHNF了！另外，seq a a与a相同，因为它seq a b（非正式地）表示：评估b之前评估，所以seq a a只是意味着：在评估之前评估a ，这与仅评估a

相同

在Haskell中更快地进行直方图计算

1 个答案: