我刚开始阅读Haskell中的并行和并发编程。
我写了两个程序,我相信,它们以两种方式总结了一个列表:
rpar (force (sum list))
以下是代码:
import Control.Parallel.Strategies
import Control.DeepSeq
import System.Environment
main :: IO ()
main = do
[n] <- getArgs
[single, faster] !! (read n - 1)
single :: IO ()
single = print . runEval $ rpar (sum list)
faster :: IO ()
faster = print . runEval $ do
let (as, bs) = splitAt ((length list) `div` 2) list
res1 <- rpar (sum as)
res2 <- rpar (sum bs)
return (res1 + res2)
list :: [Integer]
list = [1..10000000]
启用并行化编译(-threaded)
C:\Users\k\Workspace\parallel_concurrent_haskell>ghc Sum.hs -O2 -threaded -rtsopts
[1 of 1] Compiling Main ( Sum.hs, Sum.o )
Linking Sum.exe ...
single
计划
C:\Users\k\Workspace\parallel_concurrent_haskell>Sum 1 +RTS -s -N2
50000005000000
960,065,896 bytes allocated in the heap
363,696 bytes copied during GC
43,832 bytes maximum residency (2 sample(s))
57,016 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1837 colls, 1837 par 0.00s 0.01s 0.0000s 0.0007s
Gen 1 2 colls, 1 par 0.00s 0.00s 0.0002s 0.0003s
Parallel GC work balance: 0.18% (serial 0%, perfect 100%)
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)
SPARKS: 1 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.27s ( 0.27s elapsed)
GC time 0.00s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.27s ( 0.28s elapsed)
Alloc rate 3,614,365,726 bytes per MUT second
Productivity 100.0% of total user, 95.1% of total elapsed
gc_alloc_block_sync: 573
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
使用faster
C:\Users\k\Workspace\parallel_concurrent_haskell>Sum 2 +RTS -s -N2
50000005000000
1,600,100,336 bytes allocated in the heap
1,477,564,464 bytes copied during GC
400,027,984 bytes maximum residency (14 sample(s))
70,377,336 bytes maximum slop
911 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3067 colls, 3067 par 1.05s 0.68s 0.0002s 0.0021s
Gen 1 14 colls, 13 par 1.98s 1.53s 0.1093s 0.5271s
Parallel GC work balance: 0.00% (serial 0%, perfect 100%)
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)
SPARKS: 2 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.38s ( 1.74s elapsed)
GC time 3.03s ( 2.21s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 3.42s ( 3.95s elapsed)
Alloc rate 4,266,934,229 bytes per MUT second
Productivity 11.4% of total user, 9.9% of total elapsed
gc_alloc_block_sync: 335
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
为什么single
在 0.28秒中完成,但faster
(名字不当,显然) 3.95秒?
答案 0 :(得分:5)
我不是特定于haskell的分析专家,但我可以在faster
中看到几个可能的问题。你正在走输入列表至少三次:一次得到它的长度,一次得到splitAt(也许它是两次,我不完全确定它是如何实现的),然后再次读取和求和它的元素。在single
中,列表只会走一次。
您还可以使用faster
一次性将整个列表保存在内存中,但使用single
haskell可以懒散地处理它,并且GC可以随时处理。如果查看分析输出,可以看到faster
在GC期间复制了更多字节:超过3,000倍! faster
同时还需要400MB内存,其中single
一次只需要40KB。所以垃圾收集器有更大的空间来继续扫描。
另一个重大问题:您在faster
中分配了大量新的利弊单元,以保存两个中间子列表。即使它可以立即全部GC,这也是分配的大量时间。它比开始添加更昂贵!因此,即使在开始添加之前,与simple
相比,您已经“超出预算”。
答案 1 :(得分:2)
遵循amalloy的回答......我的机器比你的慢,并且运行你的机器 总时间0.41秒(已过去0.35秒)
我试过了:
list = [ 1..10000000]
list1 = [ 1..5000000]
list2 = [ 5000001 .. 10000000 ]
fastest :: IO ()
fastest = print . runEval $ do
res1 <- rpar (sum list1)
res2 <- rpar (sum list2)
return (res1 + res2)
我得到了
c:\Users\peter\Documents\Haskell\practice>parlist 4 +RTS -s -N2
parlist 4 +RTS -s -N2
50000005000000
960,068,544 bytes allocated in the heap
1,398,472 bytes copied during GC
43,832 bytes maximum residency (3 sample(s))
203,544 bytes maximum slop
3 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1836 colls, 1836 par 0.00s 0.01s 0.0000s 0.0009s
Gen 1 3 colls, 2 par 0.00s 0.00s 0.0002s 0.0004s
Parallel GC work balance: 0.04% (serial 0%, perfect 100%)
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)
SPARKS: 2 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.31s ( 0.33s elapsed)
GC time 0.00s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.31s ( 0.35s elapsed)
Alloc rate 3,072,219,340 bytes per MUT second
Productivity 100.0% of total user, 90.1% of total elapsed
哪个更快......