使用Haskell并行策略的多线程子集和

时间:2015-12-05 04:16:53

标签: multithreading haskell parallel-processing

我正在尝试使用Parallel.Strategies并行化我的子集求和器,并需要一点帮助来了解正在发生的事情。

问题

numbers :: [Int]中找到总和为100000000的数字子集。

单线程解决方案:

import Data.List (find)
import Data.Maybe (fromJust, isJust)

numbers = [14920416,14602041,14088921,13371291,13216099,12153625,10896437
          ,10884343,10228468,10177453,9998564,9920883,9511265,8924305
          ,8452302,8103727,7519471,7043381,7028847,6418450,6222190,6215767
          ,6190960,5514135,4798322,3823984,3247980,837289] :: [Int]

subsequencesOfSize :: Int -> [Int] -> [[Int]]
subsequencesOfSize n xs = let l = length xs
                          in if n>l then [] else subsequencesBySize xs !! (l-n)
  where
    subsequencesBySize [] = [[[]]]
    subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                                in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])

subsetSum :: [Int] -> Int -> Maybe [Int] 
subsetSum seq n = find ((==target) . sum) (subsequencesOfSize n seq)
  where target = 100000000

solve = map (subsetSum numbers) [n,n-1 .. 1]
  where n = (length numbers)

main = do
  print $ fromJust $ find isJust solve

并行策略

由于我已经分别计算了大小为n的子集,我想我可以使用parMap同时生成每个大小为n的子集列表的计算。我替换了map函数中的solve,如下所示:

import Control.Parallel.Strategies

solve = parMap rpar (subsetSum numbers) [n,n-1 .. 1]
  where n = (length numbers)

单核

newproblem +RTS -p -N1 -RTS

    total time  =       35.05 secs   (35047 ticks @ 1000 us, 1 processor)
    total alloc = 22,628,052,232 bytes  (excludes profiling overheads)

COST CENTRE                           MODULE  %time %alloc

subsetSum                             Main     86.6   24.5
subsequencesOfSize.subsequencesBySize Main     11.0   75.5
solve                                 Main      2.4    0.0

两个核心

        newproblem +RTS -p -N2 -RTS

    total time  =       28.80 secs   (57590 ticks @ 1000 us, 2 processors)
    total alloc = 26,537,237,440 bytes  (excludes profiling overheads)

COST CENTRE                           MODULE  %time %alloc

subsetSum                             Main     70.2   21.4
subsequencesOfSize.subsequencesBySize Main     28.8   78.6

四核

       newproblem +RTS -p -N4 -RTS

    total time  =       26.68 secs   (106727 ticks @ 1000 us, 4 processors)
    total alloc = 35,925,142,744 bytes  (excludes profiling overheads)

COST CENTRE                           MODULE  %time %alloc

subsetSum                             Main     68.2   22.4
subsequencesOfSize.subsequencesBySize Main     30.8   77.6

threadscope comparisons

正如您所看到的,使用2或4个内核比使用单个内核运行程序的速度要快得多。但是,我不相信大小为n的子集列表正由我想要的单独处理器处理。

看着线程望远镜,在我看来,好像每个人都会碰撞"在处理器活动中是对每个大小为n的子集的计算。我期望不会减少每个" bump"的运行时间,但是"颠簸"在每个处理器上并行生成。但是,前者的描述比后者更准确。

这里发生了什么?加速来自何处?为什么在子集计算之间会发生这么多垃圾收集?

提前感谢任何启蒙:D

0 个答案:

没有答案