Question

我有几千个无法同时处理的输出文件，因此我想要一个一次处理一大块 n 文件的函数。所以我决定使用TBQueue。

实现的想法是首先用 n 虚拟值填充队列，然后循环尝试读取队列中的下一个虚拟值。如果队列中存在值，则执行IO操作，并且当IO操作完成时，将向队列添加新值。否则readTBQueue将阻塞，直到其中一个进程结束（至少是我希望的）。

我的问题是： 1.当没有更多文件要处理时，主线程是否要等到所有孩子都完成？ 2.如果一个异步崩溃会发生什么？是要写在队列上的虚拟值吗？

processFiles :: Int -> [FilePath] -> (FilePath -> IO ()) -> IO ()
processFiles n fs fun = do
                 tbQ  <- atomically $ newTBQueue n
                 atomically $ replicateM_ n $ writeTBQueue tbQ () 
                 loop fs tbQ
 where loop :: [FilePath] -> TBQueue () -> IO () 
       loop files queue | null files = return ()  
                        | otherwise  = do 
                                       join . atomically $ do 
                                         readTBQueue queue
                                         let file = head files 
                                         return $ withAsync (fun file) $ \a -> do 
                                                        wait a 
                                                        atomically $ writeTBQueue queue ()
                                       loop (tail files) queue

按照 MathematicalOrchid 的建议（谢谢！），我写了一个新的实现

processFiles :: Int -> [FilePath] -> (FilePath -> IO ()) -> IO ()
processFiles n fs fun = do
                 tbQ  <- atomically $ newTBQueue n
                 loop fs tbQ
 where loop :: [FilePath] -> TBQueue FilePath -> IO () 
       loop files queue | null files = return ()  
                        | otherwise  = do 
                                       join . atomically $ do 
                                         writeTBQueue queue (head files)
                                         let actionSTM = atomically $ readTBQueue queue
                                         return $ withAsync actionSTM $ \a -> do 
                                                        file <- wait a 
                                                        async (fun file) >>= doSomethingOnException
                                       loop (tail files) queue
       doSomethingOnException  :: Async () -> IO ()
       doSomethingOnException a = do 
           r <- waitCatch a
           case r of
                Left exception -> undefined
                Right _        -> return ()

但是我仍然不确定循环函数何时返回，它必须等待待处理的作业。

Answer 1

这里似乎有两个不同的问题：同步和可靠性。

STM就是让多个线程访问可变数据而不破坏它。 TBQueue应该处理得很好。如果你想要＆＃34;崩溃＆＃34;要重新启动的操作......您需要为此构建额外的基础架构。

是否有一个特定的原因可以用＃34;虚拟值填充队列＆＃34;而不是说，要处理的实际文件名？如果是我，那么主要威胁的工作就是用文件名填充队列（当队列太满时，主线程将在工作线程完成工作时被阻塞）。如果你想从＆＃34;崩溃＆＃34;恢复线程，每个线程的每个工作程序的顶级代码捕获异常并重试操作或其他东西。或者，那是我如何做到的......

异步和TBqueue

1 个答案: