使用mapConcurrently读取stdin,进行HTTP调用并并行写入stdout

时间:2018-05-12 12:36:51

标签: haskell parallel-processing stdout stdin

我正在编写一个程序,它从标准输入读取多个URL(每行一个),稍微调整它们并并行地为这些多个URL中的每个URL发出HTTP请求。响应打印到标准输出。这是代码:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Monad
import Network.Wreq
import Control.Concurrent.MSem
import Control.Concurrent.Async
import Control.Concurrent (threadDelay)
import qualified Data.Traversable as T

main :: IO ()
main = void $ mapPool 4 (const processUrl) [1..]

mapPool :: T.Traversable t => Int -> (a -> IO b) -> t a -> IO (t b)
mapPool max f xs = do semaphore <- new max
                      mapConcurrently (with semaphore . f) xs

processUrl :: IO ()
processUrl = do param <- getLine
                response <- get (url ++ param)
                print response

url = "http://example.com/resources?param="

并行性在这里被硬编码为四个。当批处理中的某些 IO动作(HTTP请求)失败时,就会出现问题。根据{{​​1}}的设计,如果一个操作失败,其余操作将被取消。在我的情况下,似乎最后一批将始终失败,因为输入命中EOF,发生异常,程序输出:

Control.Concurrent.Async.mapConcurrently

是否有mapConcurrent的替代方法,如果一个以异常结束,则不会取消所有其他操作?如果没有,是否有更好的方法来处理这类任务?

1 个答案:

答案 0 :(得分:2)

  

是否有mapConcurrent的替代方法不会全部取消   如果一个以异常结束的其他操作?

这里的异常是可以预测的,所以我们也许应该在源代码处理问题,例如checking for EOF在读取每一行之前。我们可以将其放在使用IO (Maybe String)来表示EOF的Nothing操作中。

getLineMaybe :: IO (Maybe String)
getLineMaybe =
    do isEOF <- hIsEOF stdin
       if isEOF then return Nothing
                else Just <$> System.IO.getLine

您的示例存在问题:同时写入标准输出是likely to produce a garbled result。写入stdout的过程应该只从一个线程完成,也可以从stdin读取。

也许我们可以有两个(可关闭的和有界的)并发队列,一个是我们从stdin读取行,另一个是我们将处理后的结果放在后面写的。将一个连接到另一个将有许多工作线程。

使用套餐asyncstmstm-chans

import           Control.Concurrent.Async
import           Control.Concurrent.STM
import           Control.Concurrent.STM.TVar
import qualified Control.Concurrent.STM.TBMQueue as Q -- closeable, bounded queue

和这个助手功能

untilNothing :: IO (Maybe a) -> (a -> IO ()) -> IO () -> IO ()
untilNothing action handler finalizer =
   let go = do mx <- action
               case mx of
                   Nothing -> finalizer
                   Just x -> do handler x
                                go
    in go     

我们可以编写如下通用函数

data ConcConf = ConcConf {
                           pendingQueueSize :: Int,
                           doneQueueSize :: Int,
                           concurrencyLevel :: Int
                         } deriving Show

concPipeline :: ConcConf -> IO (Maybe a) -> (a -> IO b) -> (b -> IO ()) -> IO ()
concPipeline conf reader transformer writer =
    do src <- atomically $ Q.newTBMQueue (pendingQueueSize conf)
       dst <- atomically $ Q.newTBMQueue (doneQueueSize conf)
       workersLeft <- atomically $ newTVar (concurrencyLevel conf)
       let gang = replicateConcurrently_ (concurrencyLevel conf)
           pipeline =
               untilNothing reader
                            (\a -> atomically $ Q.writeTBMQueue src a)
                            (atomically $ Q.closeTBMQueue src)
               `concurrently_`
               untilNothing (atomically $ Q.readTBMQueue dst)
                            writer
                            (pure ())
               `concurrently_`
               -- worker threads connecting reader and writer
               gang (untilNothing (atomically $ Q.readTBMQueue src)
                                  (\a -> do b <- transformer a
                                            atomically $ Q.writeTBMQueue dst b)
                                  -- last one remaining closes shop
                                  (atomically $ do modifyTVar' workersLeft pred
                                                   c <- readTVar workersLeft
                                                   if c == 0 then Q.closeTBMQueue dst
                                                             else pure ()))
       pipeline