我正在编写一个程序,它从标准输入读取多个URL(每行一个),稍微调整它们并并行地为这些多个URL中的每个URL发出HTTP请求。响应打印到标准输出。这是代码:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Monad
import Network.Wreq
import Control.Concurrent.MSem
import Control.Concurrent.Async
import Control.Concurrent (threadDelay)
import qualified Data.Traversable as T
main :: IO ()
main = void $ mapPool 4 (const processUrl) [1..]
mapPool :: T.Traversable t => Int -> (a -> IO b) -> t a -> IO (t b)
mapPool max f xs = do semaphore <- new max
mapConcurrently (with semaphore . f) xs
processUrl :: IO ()
processUrl = do param <- getLine
response <- get (url ++ param)
print response
url = "http://example.com/resources?param="
并行性在这里被硬编码为四个。当批处理中的某些 IO动作(HTTP请求)失败时,就会出现问题。根据{{1}}的设计,如果一个操作失败,其余操作将被取消。在我的情况下,似乎最后一批将始终失败,因为输入命中EOF,发生异常,程序输出:
Control.Concurrent.Async.mapConcurrently
是否有mapConcurrent的替代方法,如果一个以异常结束,则不会取消所有其他操作?如果没有,是否有更好的方法来处理这类任务?
答案 0 :(得分:2)
是否有mapConcurrent的替代方法不会全部取消 如果一个以异常结束的其他操作?
这里的异常是可以预测的,所以我们也许应该在源代码处理问题,例如checking for EOF在读取每一行之前。我们可以将其放在使用IO (Maybe String)
来表示EOF的Nothing
操作中。
getLineMaybe :: IO (Maybe String)
getLineMaybe =
do isEOF <- hIsEOF stdin
if isEOF then return Nothing
else Just <$> System.IO.getLine
您的示例存在问题:同时写入标准输出是likely to produce a garbled result。写入stdout的过程应该只从一个线程完成,也可以从stdin读取。
也许我们可以有两个(可关闭的和有界的)并发队列,一个是我们从stdin读取行,另一个是我们将处理后的结果放在后面写的。将一个连接到另一个将有许多工作线程。
import Control.Concurrent.Async
import Control.Concurrent.STM
import Control.Concurrent.STM.TVar
import qualified Control.Concurrent.STM.TBMQueue as Q -- closeable, bounded queue
和这个助手功能
untilNothing :: IO (Maybe a) -> (a -> IO ()) -> IO () -> IO ()
untilNothing action handler finalizer =
let go = do mx <- action
case mx of
Nothing -> finalizer
Just x -> do handler x
go
in go
我们可以编写如下通用函数
data ConcConf = ConcConf {
pendingQueueSize :: Int,
doneQueueSize :: Int,
concurrencyLevel :: Int
} deriving Show
concPipeline :: ConcConf -> IO (Maybe a) -> (a -> IO b) -> (b -> IO ()) -> IO ()
concPipeline conf reader transformer writer =
do src <- atomically $ Q.newTBMQueue (pendingQueueSize conf)
dst <- atomically $ Q.newTBMQueue (doneQueueSize conf)
workersLeft <- atomically $ newTVar (concurrencyLevel conf)
let gang = replicateConcurrently_ (concurrencyLevel conf)
pipeline =
untilNothing reader
(\a -> atomically $ Q.writeTBMQueue src a)
(atomically $ Q.closeTBMQueue src)
`concurrently_`
untilNothing (atomically $ Q.readTBMQueue dst)
writer
(pure ())
`concurrently_`
-- worker threads connecting reader and writer
gang (untilNothing (atomically $ Q.readTBMQueue src)
(\a -> do b <- transformer a
atomically $ Q.writeTBMQueue dst b)
-- last one remaining closes shop
(atomically $ do modifyTVar' workersLeft pred
c <- readTVar workersLeft
if c == 0 then Q.closeTBMQueue dst
else pure ()))
pipeline