我对Haskell很新,我想创建一个直方图。我正在使用Data.Vector.Unboxed融合数据操作;这是快速的(当使用-O -fllvm编译时)和瓶颈是我的折叠应用程序;汇总了桶数。
如何让它更快?我读过关于通过严格控制来减少thunk数量的问题,所以我通过使用seq和foldr来解决问题。但没有看到太多的性能提升。强烈鼓励你的想法。
import qualified Data.Vector.Unboxed as V
histogram :: [(Int,Int)]
histogram = V.foldr' agg [] $ V.zip k v
where
n = 10000000
c = 1000000
k = V.generate n (\i -> i `div` c * c)
v = V.generate n (\i -> 1)
agg kv [] = [kv]
agg kv@(k,v) acc@((ck,cv):as)
| k == ck = let a = (ck,cv+v):as in a `seq` a
| otherwise = let a = kv:acc in a `seq` a
main :: IO ()
main = print histogram
编译:
ghc --make -O -fllvm histogram.hs
答案 0 :(得分:15)
首先,使用-O2 -rtsopts
编译程序。然后,为了获得可以优化的第一个想法,使用选项+RTS -sstderr
运行程序:
$ ./question +RTS -sstderr
[(0,1000000),(1000000,1000000),(2000000,1000000),(3000000,1000000),(4000000,1000000),(5000000,1000000),(6000000,1000000),(7000000,1000000),(8000000,1000000),(9000000,1000000)]
1,193,907,224 bytes allocated in the heap
1,078,027,784 bytes copied during GC
282,023,968 bytes maximum residency (7 sample(s))
86,755,184 bytes maximum slop
763 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1964 colls, 0 par 3.99s 4.05s 0.0021s 0.0116s
Gen 1 7 colls, 0 par 1.60s 1.68s 0.2399s 0.6665s
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.67s ( 2.68s elapsed)
GC time 5.59s ( 5.73s elapsed)
EXIT time 0.02s ( 0.03s elapsed)
Total time 8.29s ( 8.43s elapsed)
%GC time 67.4% (67.9% elapsed)
Alloc rate 446,869,876 bytes per MUT second
Productivity 32.6% of total user, 32.0% of total elapsed
请注意, 67%的时间用于GC!显然有些不对劲。为了找出问题所在,我们可以在启用堆分析的情况下运行程序(使用+RTS -h
),这会产生下图:
所以,你正在泄漏thunk。这是怎么发生的?查看代码,在agg
中建立(递归)thunk的唯一时间是进行添加时。通过添加爆炸模式使cv
严格,从而解决了问题:
{-# LANGUAGE BangPatterns #-}
import qualified Data.Vector.Unboxed as V
histogram :: [(Int,Int)]
histogram = V.foldr' agg [] $ V.zip k v
where
n = 10000000
c = 1000000
k = V.generate n (\i -> i `div` c * c)
v = V.generate n id
agg kv [] = [kv]
agg kv@(k,v) acc@((ck,!cv):as) -- Note the !
| k == ck = (ck,cv+v):as
| otherwise = kv:acc
main :: IO ()
main = print histogram
输出:
$ time ./improved +RTS -sstderr
[(0,499999500000),(1000000,1499999500000),(2000000,2499999500000),(3000000,3499999500000),(4000000,4499999500000),(5000000,5499999500000),(6000000,6499999500000),(7000000,7499999500000),(8000000,8499999500000),(9000000,9499999500000)]
672,063,056 bytes allocated in the heap
94,664 bytes copied during GC
160,028,816 bytes maximum residency (2 sample(s))
1,464,176 bytes maximum slop
155 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 992 colls, 0 par 0.03s 0.03s 0.0000s 0.0001s
Gen 1 2 colls, 0 par 0.03s 0.03s 0.0161s 0.0319s
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.24s ( 1.25s elapsed)
GC time 0.06s ( 0.06s elapsed)
EXIT time 0.03s ( 0.03s elapsed)
Total time 1.34s ( 1.34s elapsed)
%GC time 4.4% (4.5% elapsed)
Alloc rate 540,674,868 bytes per MUT second
Productivity 95.5% of total user, 95.1% of total elapsed
./improved +RTS -sstderr 1,14s user 0,20s system 99% cpu 1,352 total
这要好得多。
现在您可以问,即使您使用seq
,为什么会出现问题?原因是seq
只强制第一个参数为WHNF,对于一对,(_,_)
(其中_是未评估的thunk)已经是WHNF了!另外,seq a a
与a
相同,因为它seq a b
(非正式地)表示:评估b之前评估,所以seq a a
只是意味着:在评估之前评估a ,这与仅评估a