Haskell - parMap的用途是什么?

时间:2017-12-23 21:48:33

标签: haskell parallel-processing


import Control.Parallel.Strategies
import Data.Vector as V
import Data.Maybe

parMapVec :: (a -> b) -> Vector a -> Vector b
parMapVec f v = runEval $ evalTraversable rpar $ V.map f v

range :: Integer -> Integer -> Vector Integer
range x y
  | x == y = x `cons` empty
  | x < y  = x `cons` (range (x + 1) y)
  | x > y  = (range x (y + 1)) `snoc` y

fac :: Integer -> Integer
fac n
  | n < 2     = 1
  | otherwise = n * (fac $ n - 1)

main :: IO ()
main = do
  let result = runEval $ do
        let calc = parMapVec fac $ 80000 `range` 80007
        rseq calc
        return calc
  putStrLn $ show result


main = do
  let result = runEval $ do
        let calc = parMap rpar fac [80000..80007]
        rseq calc
        return calc
  putStrLn $ show result

我使用gch --make parVectorTest.hs -threaded -rtsopts进行了编译,并使用./parVectorTest -s进行了编译。


56,529,547,832 bytes allocated in the heap
10,647,896,984 bytes copied during GC
    7,281,792 bytes maximum residency (16608 sample(s))
    3,285,392 bytes maximum slop
            21 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
Gen  0     82708 colls,     0 par    0.828s   0.802s     0.0000s    0.0016s
Gen  1     16608 colls,     0 par   15.006s  14.991s     0.0009s    0.0084s

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

SPARKS: 8 (7 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)

INIT    time    0.001s  (  0.001s elapsed)
MUT     time    5.368s  (  5.369s elapsed)
GC      time   15.834s  ( 15.793s elapsed)
EXIT    time    0.001s  (  0.000s elapsed)
Total   time   21.206s  ( 21.163s elapsed)

Alloc rate    10,530,987,847 bytes per MUT second

Productivity  25.3% of total user, 25.4% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0


56,529,535,488 bytes allocated in the heap
12,483,967,024 bytes copied during GC
    6,246,872 bytes maximum residency (19843 sample(s))
    2,919,544 bytes maximum slop
            20 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
Gen  0     79459 colls,     0 par    0.818s   0.786s     0.0000s    0.0009s
Gen  1     19843 colls,     0 par   17.725s  17.709s     0.0009s    0.0087s

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

SPARKS: 16 (14 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)

INIT    time    0.001s  (  0.001s elapsed)
MUT     time    5.394s  (  5.400s elapsed)
GC      time   18.543s  ( 18.495s elapsed)
EXIT    time    0.000s  (  0.000s elapsed)
Total   time   23.940s  ( 23.896s elapsed)

Alloc rate    10,479,915,927 bytes per MUT second

Productivity  22.5% of total user, 22.6% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0


由于this question的答案,我还使用./parVector -s -C0.01运行了两个测试,结果基本相同。我正在使用联想Ideapad,8核,运行Ubuntu Linux 17.04。在测试时,我打开的唯一应用程序是VS Code和我的系统监视器,尽管其他进程占用了很小一部分处理能力。处理器是否必须完全空闲才能产生火花?

1 个答案:

答案 0 :(得分:5)

默认情况下,GHC使用单个OS线程运行所有程序,即使启用了-threaded也是如此。注意文字&#34;使用-N1&#34;在您的输出中 - 它表示程序正在运行1个物理线程。

简而言之:通过例如+RTS -N8到您的计划。有关此标志的文档,请参阅here

从广义上讲,这是由于并行性和并发性之间的区别。 Here are some所有试图解释差异的问题。差异可以概括为:

  • parrallelism:一个任务细分为类似的块,可以在某个时间点在不同的内核/ CPU上同时运行;提高速度

  • 并发:几个任务在概念上独立执行,使得它们的执行时间重叠,无论是在相同的线程上通过时间切片还是在单独的内核/ CPU上;通常更有效地利用共享资源
