Question

我有一些Haskell代码，涉及到一个大的（65.5k元素）项目列表中做很多相互不重叠的事情。这似乎非常适合并行化，我使用Control.Parallel.Strategies.parBuffer进行了。这有所帮助，但我确信这项工作太精细了，而且我还希望以块的形式处理列表（正如Control.Parallel.Strategies.parListChunk所做的那样）。但是，因为我的列表很大，所以仅使用 parListChunk的实验没有获得那么多的加速，因为必须对整个65多万个项目列表进行评估才能使其工作（正如程序的内存使用所示。）

有没有办法写一个Strategy给我带来两者 parBuffer的好处（即该列表被视为具有可控评估量的惰性缓冲区）以及parListChunk（即工作被分解为由列表的几个元素而不是个人组成的部分）。我不确定该怎么做。

编辑：根据请求，以下是我正在使用的内容，并附有解释性说明：

parBufferMap :: Int -> Strategy b -> (a -> b) -> [a] -> [b]
parBufferMap i strat f = withStrategy (parBuffer i strat) . fmap f

main :: IO ()
main = do
  let allTables = genAllTables 4 -- a list of 65.5k Tables               
  let results = parBufferMap 512 rdeepseq theNeedful allTables -- theNeedful is what I need to do to each Table, independently of each other
  let indexed = zip [1..] results
  let stringified = stringify <$> indexed -- make them pretty for output
  void . traverse putStrLn $ stringified -- actually print them

我的目标是将results计算原样（仅使用 parBufferMap）替换为结合了parBufferMap和{{1 }}

Answer 1

所以你似乎想要计算：

map theNeedful allTables

但是你想分批进行512个表的映射。

这看起来对你有用吗？

-- assuming:
theNeedful :: Table -> Result

nthreads = 4   -- number of threads to keep busy
allTables = ...
allBatches = chunksOf 512 allTables  -- from Data.List.Split

doBatch :: [Table] -> [Result]
doBatch tables = map theNeedful tables

results :: [Result]
results = concat $ withStrategy (parBuffer nthreads rdeepseq) (map doBatch allBatches)
...

用语言说：

将表分成512个表格的块
将doBatch映射到所有批次
在该计算列表上执行parBuffer
concat结果列表

我如何结合``parBuffer``和``parListChunk``的好处？

1 个答案: