我有一种算法可以对给定长度的列表进行并行排序:
import Control.Parallel (par, pseq)
import Data.Time.Clock (diffUTCTime, getCurrentTime)
import System.Environment (getArgs)
import System.Random (StdGen, getStdGen, randoms)
parSort :: (Ord a) => [a] -> [a]
parSort (x:xs) = force greater `par` (force lesser `pseq`
(lesser ++ x:greater))
where lesser = parSort [y | y <- xs, y < x]
greater = parSort [y | y <- xs, y >= x]
parSort _ = []
sort :: (Ord a) => [a] -> [a]
sort (x:xs) = lesser ++ x:greater
where lesser = sort [y | y <- xs, y < x]
greater = sort [y | y <- xs, y >= x]
sort _ = []
parSort2 :: (Ord a) => Int -> [a] -> [a]
parSort2 d list@(x:xs)
| d <= 0 = sort list
| otherwise = force greater `par` (force lesser `pseq`
(lesser ++ x:greater))
where lesser = parSort2 d' [y | y <- xs, y < x]
greater = parSort2 d' [y | y <- xs, y >= x]
d' = d - 1
parSort2 _ _ = []
force :: [a] -> ()
force xs = go xs `pseq` ()
where go (_:xs) = go xs
go [] = 1
randomInts :: Int -> StdGen -> [Int]
randomInts k g = let result = take k (randoms g)
in force result `seq` result
testFunction = parSort
main = do
args <- getArgs
let count | null args = 500000
| otherwise = read (head args)
input <- randomInts count `fmap` getStdGen
start <- getCurrentTime
let sorted = testFunction input
putStrLn $ "Sort list N = " ++ show (length sorted)
end <- getCurrentTime
putStrLn $ show (end `diffUTCTime` start)
我想花时间在少于1个内核的2、3和4个处理器内核上执行并行排序。 目前,这个结果我无法实现。 这是我的程序启动:
1. SortList +RTS -N1 -RTS 10000000
time = 41.2 s
2.SortList +RTS -N3 -RTS 10000000
time = 39.55 s
3.SortList +RTS -N4 -RTS 10000000
time = 54.2 s
我该怎么办?
更新1:
testFunction = parSort2 60
答案 0 :(得分:1)
这是您可以使用Data.Map
处理的一个想法。为了简化和提高性能,我假定元素类型可以替代,因此我们可以计算出现次数而不是存储元素列表。我相信,使用一些奇特的数组算法可以获得更好的结果,但这很简单,并且(基本上)可以起作用。
在编写并行算法时,我们希望最大程度地减少必须顺序执行的工作量。在对列表进行排序时,我们确实不能避免顺序地做一件事:将列表拆分为多个线程以供处理。我们希望以最小的努力来完成该任务,然后尝试从那时开始并行工作。
让我们从一个简单的顺序算法开始。
{-# language BangPatterns, TupleSections #-}
import qualified Data.Map.Strict as M
import Data.Map (Map)
import Data.List
import Control.Parallel.Strategies
type Bag a = Map a Int
ssort :: Ord a => [a] -> [a]
ssort xs =
let m = M.fromListWith (+) $ (,1) <$> xs
in concat [replicate c x | (x,c) <- M.toList m]
我们如何并行化呢?首先,让我们将列表分成几部分。有多种方法可以做到这一点,但没有一个很好。假设功能数量很少,我认为让它们每个人走在列表本身是合理的。随时尝试其他方法。
-- | Every Nth element, including the first
everyNth :: Int -> [a] -> [a]
everyNth n | n <= 0 = error "What you doing?"
everyNth n = go 0 where
go !_ [] = []
go 0 (x : xs) = x : go (n - 1) xs
go k (_ : xs) = go (k - 1) xs
-- | Divide up a list into N pieces fairly. Walking each list in the
-- result will walk the original list.
splatter :: Int -> [a] -> [[a]]
splatter n = map (everyNth n) . take n . tails
现在我们有了清单,我们触发了将其转换为包的线程。
parMakeBags :: Ord a => [[a]] -> Eval [Bag a]
parMakeBags xs =
traverse (rpar . M.fromListWith (+)) $ map (,1) <$> xs
现在我们可以重复合并成对的袋子,直到只有一个。
parMergeBags_ :: Ord a => [Bag a] -> Eval (Bag a)
parMergeBags_ [] = pure M.empty
parMergeBags_ [t] = pure t
parMergeBags_ q = parMergeBags_ =<< go q where
go [] = pure []
go [t] = pure [t]
go (t1:t2:ts) = (:) <$> rpar (M.unionWith (+) t1 t2) <*> go ts
但是...有问题。在每一轮合并中,我们仅使用上一个合并功能的一半,而仅使用一个功能执行最终合并。哎哟!要解决此问题,我们需要并行化unionWith
。幸运的是,这很容易!
import Data.Map.Internal (Map (..), splitLookup, link)
parUnionWith
:: Ord k
=> (v -> v -> v)
-> Int -- Number of threads to spark
-> Map k v
-> Map k v
-> Eval (Map k v)
parUnionWith f n t1 t2 | n <= 1 = rseq $ M.unionWith f t1 t2
parUnionWith _ !_ Tip t2 = rseq t2
parUnionWith _ !_ t1 Tip = rseq t1
parUnionWith f n (Bin _ k1 x1 l1 r1) t2 = case splitLookup k1 t2 of
(l2, mb, r2) -> do
l1l2 <- parEval $ parUnionWith f (n `quot` 2) l1 l2
r1r2 <- parUnionWith f (n `quot` 2) r1 r2
case mb of
Nothing -> rseq $ link k1 x1 l1l2 r1r2
Just x2 -> rseq $ link k1 fx1x2 l1l2 r1r2
where !fx1x2 = f x1 x2
现在我们可以完全并行化包合并了:
-- Uses the given number of capabilities per merge, initially,
-- doubling for each round.
parMergeBags :: Ord a => Int -> [Bag a] -> Eval (Bag a)
parMergeBags !_ [] = pure M.empty
parMergeBags !_ [t] = pure t
parMergeBags n q = parMergeBags (n * 2) =<< go q where
go [] = pure []
go [t] = pure [t]
go (t1:t2:ts) = (:) <$> parEval (parUnionWith (+) n t1 t2) <*> go ts
然后我们可以像这样实现并行合并:
parMerge :: Ord a => [[a]] -> Eval [a]
parMerge xs = do
bags <- parMakeBags xs
-- Why 2 and not one? We only have half as many
-- pairs as we have lists (capabilities we want to use)
-- so we double up.
m <- parMergeBags 2 bags
pure $ concat [replicate c x | (x,c) <- M.toList m]
将片段拼凑在一起
parSort :: Ord a => Int -> [a] -> Eval [a]
parSort n = parMerge . splatter n
pSort :: Ord a => Int -> [a] -> [a]
pSort n = runEval . parMerge . splatter n
剩下的只有一个顺序片段可以并行处理:将最终的包转换为列表。值得并行化吗?我很确定实际上不是。但是,无论如何,我们还是乐在其中!为了避免相当大的复杂性,我假设不存在大量相等的元素。结果中重复的元素会导致结果列表中剩余一些工作(麻烦)。
我们需要一个基本的部分列表脊柱矫正器:
-- | Force the first n conses of a list
walkList :: Int -> [a] -> ()
walkList n _ | n <= 0 = ()
walkList _ [] = ()
walkList n (_:xs) = walkList (n - 1) xs
现在我们可以将包转换成平行块的列表,而无需进行连接:
-- | Use up to the given number of threads to convert a bag
-- to a list, appending the final list argument.
parToListPlus :: Int -> Bag k -> [k] -> Eval [k]
parToListPlus n m lst | n <= 1 = do
rseq (walkList (M.size m) res)
pure res
-- Note: the concat and ++ should fuse away when compiling with
-- optimization.
where res = concat [replicate c x | (x,c) <- M.toList m] ++ lst
parToListPlus _ Tip lst = pure lst
parToListPlus n (Bin _ x c l r) lst = do
r' <- parEval $ parToListPlus (n `quot` 2) r lst
res <- parToListPlus (n `quot` 2) l $ replicate c x ++ r'
rseq r' -- make sure the right side is finished
pure res
然后我们相应地修改合并:
parMerge :: Ord a => Int -> [[a]] -> Eval [a]
parMerge n xs = do
bags <- parMakeBags xs
m <- parMergeBags 2 bags
parToListPlus n m []