我正在尝试使用Parallel.Strategies
并行化我的子集求和器,并需要一点帮助来了解正在发生的事情。
在numbers :: [Int]
中找到总和为100000000
的数字子集。
单线程解决方案:
import Data.List (find)
import Data.Maybe (fromJust, isJust)
numbers = [14920416,14602041,14088921,13371291,13216099,12153625,10896437
,10884343,10228468,10177453,9998564,9920883,9511265,8924305
,8452302,8103727,7519471,7043381,7028847,6418450,6222190,6215767
,6190960,5514135,4798322,3823984,3247980,837289] :: [Int]
subsequencesOfSize :: Int -> [Int] -> [[Int]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
subsetSum :: [Int] -> Int -> Maybe [Int]
subsetSum seq n = find ((==target) . sum) (subsequencesOfSize n seq)
where target = 100000000
solve = map (subsetSum numbers) [n,n-1 .. 1]
where n = (length numbers)
main = do
print $ fromJust $ find isJust solve
由于我已经分别计算了大小为n的子集,我想我可以使用parMap
同时生成每个大小为n的子集列表的计算。我替换了map
函数中的solve
,如下所示:
import Control.Parallel.Strategies
solve = parMap rpar (subsetSum numbers) [n,n-1 .. 1]
where n = (length numbers)
newproblem +RTS -p -N1 -RTS
total time = 35.05 secs (35047 ticks @ 1000 us, 1 processor)
total alloc = 22,628,052,232 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
subsetSum Main 86.6 24.5
subsequencesOfSize.subsequencesBySize Main 11.0 75.5
solve Main 2.4 0.0
newproblem +RTS -p -N2 -RTS
total time = 28.80 secs (57590 ticks @ 1000 us, 2 processors)
total alloc = 26,537,237,440 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
subsetSum Main 70.2 21.4
subsequencesOfSize.subsequencesBySize Main 28.8 78.6
newproblem +RTS -p -N4 -RTS
total time = 26.68 secs (106727 ticks @ 1000 us, 4 processors)
total alloc = 35,925,142,744 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
subsetSum Main 68.2 22.4
subsequencesOfSize.subsequencesBySize Main 30.8 77.6
正如您所看到的,使用2或4个内核比使用单个内核运行程序的速度要快得多。但是,我不相信大小为n的子集列表正由我想要的单独处理器处理。
看着线程望远镜,在我看来,好像每个人都会碰撞"在处理器活动中是对每个大小为n的子集的计算。我期望不会减少每个" bump"的运行时间,但是"颠簸"在每个处理器上并行生成。但是,前者的描述比后者更准确。
这里发生了什么?加速来自何处?为什么在子集计算之间会发生这么多垃圾收集?
提前感谢任何启蒙:D