Haskell的Control.Concurrent.Async.mapConcurrent当前有限制吗?

时间:2013-09-19 13:33:46

标签: haskell asynchronous concurrency io

我正在尝试在Haskell中并行运行多个下载,我通常只使用Control.Concurrent.Async.mapConcurrently函数。但是,这样做会打开~3000个连接,这会导致Web服务器拒绝它们。是否可以完成与mapConcurrent相同的任务,但一次只打开有限数量的连接(即一次只打开2个或4个)?

5 个答案:

答案 0 :(得分:19)

快速解决方案是使用semaphore来限制并发操作的数量。它不是最优的(所有线程都是一次创建然后等待),但是有效:

import Control.Concurrent.MSem
import Control.Concurrent.Async
import Control.Concurrent (threadDelay)
import qualified Data.Traversable as T

mapPool :: T.Traversable t => Int -> (a -> IO b) -> t a -> IO (t b)
mapPool max f xs = do
    sem <- new max
    mapConcurrently (with sem . f) xs

-- A little test:
main = mapPool 10 (\x -> threadDelay 1000000 >> print x) [1..100]

答案 1 :(得分:9)

您也可以尝试编写pooled-io包:

import qualified Control.Concurrent.PooledIO.Final as Pool
import Control.DeepSeq (NFData)
import Data.Traversable (Traversable, traverse)

mapPool ::
   (Traversable t, NFData b) =>
   Int -> (a -> IO b) -> t a -> IO (t b)
mapPool n f = Pool.runLimited n . traverse (Pool.fork . f)

答案 2 :(得分:2)

使用Control.Concurrent.Spawn库非常容易:

import Control.Concurrent.Spawn

type URL      = String
type Response = String    

numMaxConcurrentThreads = 4

getURLs :: [URL] -> IO [Response]
getURLs urlList = do
   wrap <- pool numMaxConcurrentThreads
   parMapIO (wrap . fetchURL) urlList

fetchURL :: URL -> IO Response

答案 3 :(得分:1)

如果其中一些线程的持续时间明显长于其他线程,那么对线程进行分块可能效率低下。这是一个更平滑但更复杂的解决方案:

{-# LANGUAGE TupleSections #-}
import Control.Concurrent.Async (async, waitAny)
import Data.List                (delete, sortBy)
import Data.Ord                 (comparing)

concurrentlyLimited :: Int -> [IO a] -> IO [a]
concurrentlyLimited n tasks = concurrentlyLimited' n (zip [0..] tasks) [] []

concurrentlyLimited' _ [] [] results = return . map snd $ sortBy (comparing fst) results
concurrentlyLimited' 0 todo ongoing results = do
    (task, newResult) <- waitAny ongoing
    concurrentlyLimited' 1 todo (delete task ongoing) (newResult:results)
concurrentlyLimited' n [] ongoing results = concurrentlyLimited' 0 [] ongoing results
concurrentlyLimited' n ((i, task):otherTasks) ongoing results = do
    t <- async $ (i,) <$> task
    concurrentlyLimited' (n-1) otherTasks (t:ongoing) results

注意:由于MonadBaseControl IO,使用IO的实例替代lifted-async可以使上述代码更具通用性。

答案 4 :(得分:0)

如果您在列表中有操作,则此操作具有较少的依赖性

import Control.Concurrent.Async (mapConcurrently)
import Data.List.Split (chunksOf)

mapConcurrentChunks :: Int -> (a -> IO b) -> [a] -> IO [b]
mapConcurrentChunks n ioa xs = concat <$> mapM (mapConcurrently ioa) (chunksOf n xs)

编辑:只是缩短了一点