分析两个总结大量列表的函数

时间:2015-06-17 18:40:57

标签: haskell parallel-processing

我刚开始阅读Haskell中的并行和并发编程。

我写了两个程序,我相信,它们以两种方式总结了一个列表:

  • 正在运行rpar (force (sum list))
  • 拆分列表,在每个列表上运行上述命令,然后添加每个

以下是代码:

import Control.Parallel.Strategies
import Control.DeepSeq
import System.Environment

main :: IO ()
main = do
  [n] <- getArgs
  [single, faster] !! (read n - 1)

single :: IO ()
single = print . runEval $ rpar (sum list)

faster :: IO ()
faster = print . runEval $ do
    let (as, bs) = splitAt ((length list) `div` 2) list
    res1 <- rpar (sum as)
    res2 <- rpar (sum bs)
    return (res1 + res2)


list :: [Integer]
list = [1..10000000]

启用并行化编译(-threaded)

C:\Users\k\Workspace\parallel_concurrent_haskell>ghc Sum.hs -O2 -threaded -rtsopts
[1 of 1] Compiling Main             ( Sum.hs, Sum.o )
Linking Sum.exe ...

single计划

的结果
C:\Users\k\Workspace\parallel_concurrent_haskell>Sum 1 +RTS -s -N2
50000005000000
     960,065,896 bytes allocated in the heap
         363,696 bytes copied during GC
          43,832 bytes maximum residency (2 sample(s))
          57,016 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1837 colls,  1837 par    0.00s    0.01s     0.0000s    0.0007s
  Gen  1         2 colls,     1 par    0.00s    0.00s     0.0002s    0.0003s

  Parallel GC work balance: 0.18% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 1 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.27s  (  0.27s elapsed)
  GC      time    0.00s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.27s  (  0.28s elapsed)

  Alloc rate    3,614,365,726 bytes per MUT second

  Productivity 100.0% of total user, 95.1% of total elapsed

gc_alloc_block_sync: 573
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

使用faster

运行
C:\Users\k\Workspace\parallel_concurrent_haskell>Sum 2 +RTS -s -N2
50000005000000
   1,600,100,336 bytes allocated in the heap
   1,477,564,464 bytes copied during GC
     400,027,984 bytes maximum residency (14 sample(s))
      70,377,336 bytes maximum slop
             911 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3067 colls,  3067 par    1.05s    0.68s     0.0002s    0.0021s
  Gen  1        14 colls,    13 par    1.98s    1.53s     0.1093s    0.5271s

  Parallel GC work balance: 0.00% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 2 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.38s  (  1.74s elapsed)
  GC      time    3.03s  (  2.21s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    3.42s  (  3.95s elapsed)

  Alloc rate    4,266,934,229 bytes per MUT second

  Productivity  11.4% of total user, 9.9% of total elapsed

gc_alloc_block_sync: 335
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

为什么single 0.28秒中完成,但faster(名字不当,显然) 3.95秒

2 个答案:

答案 0 :(得分:5)

我不是特定于haskell的分析专家,但我可以在faster中看到几个可能的问题。你正在走输入列表至少三次:一次得到它的长度,一次得到splitAt(也许它是两次,我不完全确定它是如何实现的),然后再次读取和求和它的元素。在single中,列表只会走一次。

您还可以使用faster一次性将整个列表保存在内存中,但使用single haskell可以懒散地处理它,并且GC可以随时处理。如果查看分析输出,可以看到faster在GC期间复制了更多字节:超过3,000倍! faster同时还需要400MB内存,其中single一次只需要40KB。所以垃圾收集器有更大的空间来继续扫描。

另一个重大问题:您在faster中分配了大量新的利弊单元,以保存两个中间子列表。即使它可以立即全部GC,这也是分配的大量时间。它比开始添加更昂贵!因此,即使在开始添加之前,与simple相比,您已经“超出预算”。

答案 1 :(得分:2)

遵循amalloy的回答......我的机器比你的慢,并且运行你的机器      总时间0.41秒(已过去0.35秒)

我试过了:

list = [  1..10000000]
list1 = [ 1..5000000]
list2 = [ 5000001 .. 10000000 ]


fastest :: IO ()
fastest = print . runEval $ do
   res1 <- rpar (sum list1)
   res2 <- rpar (sum list2)
   return (res1 + res2)

我得到了

c:\Users\peter\Documents\Haskell\practice>parlist 4 +RTS -s -N2
parlist 4 +RTS -s -N2
50000005000000
    960,068,544 bytes allocated in the heap
       1,398,472 bytes copied during GC
          43,832 bytes maximum residency (3 sample(s))
         203,544 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1836 colls,  1836 par    0.00s    0.01s     0.0000s    0.0009s
  Gen  1         3 colls,     2 par    0.00s    0.00s     0.0002s    0.0004s

  Parallel GC work balance: 0.04% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 2 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.31s  (  0.33s elapsed)
  GC      time    0.00s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.31s  (  0.35s elapsed)

  Alloc rate    3,072,219,340 bytes per MUT second

  Productivity 100.0% of total user, 90.1% of total elapsed

哪个更快......