是否可以创建获取在特定时间段内向下游发送的所有值的管道?我正在实现一个服务器,协议允许我连接传出的数据包并将它们压缩在一起,所以我希望每隔100毫秒有效地“清空”下游ByteString
的队列,并mappend
然后一起进入下一个进行压缩的管道。
答案 0 :(得分:3)
以下是使用pipes-concurrency
的解决方案。你给它任何Input
,它会定期消耗所有值的输入:
import Control.Applicative ((<|>))
import Control.Concurrent (threadDelay)
import Data.Foldable (forM_)
import Pipes
import Pipes.Concurrent
drainAll :: Input a -> STM (Maybe [a])
drainAll i = do
ma <- recv i
case ma of
Nothing -> return Nothing
Just a -> loop (a:)
where
loop diffAs = do
ma <- recv i <|> return Nothing
case ma of
Nothing -> return (Just (diffAs []))
Just a -> loop (diffAs . (a:))
bucketsEvery :: Int -> Input a -> Producer [a] IO ()
bucketsEvery microseconds i = loop
where
loop = do
lift $ threadDelay microseconds
ma <- lift $ atomically $ drainAll i
forM_ ma $ \a -> do
yield a
loop
通过选择用于构建Buffer
的{{1}}类型,您可以更好地控制从上游使用元素的方式。
如果您是Input
的新用户,可以阅读the tutorial,其中介绍了如何使用pipes-concurrency
,spawn
和Buffer
。
答案 1 :(得分:1)
这是一个可能的解决方案。它基于Pipe
标记ByteString
下游的Bool
,以便识别属于同一“时间段”的ByteStrings
。
首先,一些进口:
import Data.AdditiveGroup
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.Builder as BB
import Data.Thyme.Clock
import Data.Thyme.Clock.POSIX
import Control.Monad.State.Strict
import Control.Lens (view)
import Control.Concurrent (threadDelay)
import Pipes
import Pipes.Lift
import qualified Pipes.Prelude as P
import qualified Pipes.Group as PG
以下是标记Pipe
。它在内部使用StateT
:
tagger :: Pipe B.ByteString (B.ByteString,Bool) IO ()
tagger = do
startTime <- liftIO getPOSIXTime
evalStateP (startTime,False) $ forever $ do
b <- await
currentTime <- liftIO getPOSIXTime
-- (POSIXTime,Bool) inner state
(baseTime,tag) <- get
if (currentTime ^-^ baseTime > timeLimit)
then let tag' = not tag in
yield (b,tag') >> put (currentTime, tag')
else yield $ (b,tag)
where
timeLimit = fromSeconds 0.1
然后我们可以使用pipes-group
包中的函数将属于同一“时间段”的ByteString
分组到懒惰的ByteString
中:
batch :: Producer B.ByteString IO () -> Producer BL.ByteString IO ()
batch producer = PG.folds (<>) mempty BB.toLazyByteString
. PG.maps (flip for $ yield . BB.byteString . fst)
. view (PG.groupsBy $ \t1 t2-> snd t1 == snd t2)
$ producer >-> tagger
似乎批量正确。这个计划:
main :: IO ()
main = do
count <- P.length $ batch (yield "boo" >> yield "baa")
putStrLn $ show count
count <- P.length $ batch (yield "boo" >> yield "baa"
>> liftIO (threadDelay 200000) >> yield "ddd")
putStrLn $ show count
有输出:
1
2
请注意,当下一个存储桶的第一个元素到达时,“时间桶”的内容仅为yield
。它们不会每100毫秒自动yield
。这对您来说可能是也可能不是问题。您希望每100毫秒自动yield
,您需要一个不同的解决方案,可能基于pipes-concurrency
。
此外,您可以考虑直接使用FreeT
提供的基于pipes-group
的“效果列表”。这样,您就可以在存储桶已满之前开始在“时间段”中压缩数据。
答案 2 :(得分:0)
与Daniel的回答不同,我不会在生成数据时对其进行标记。它至少需要来自上游的元素,然后继续在monoid中聚合更多的值,直到时间间隔过去。
此代码使用列表进行聚合,但有更好的幺半群与
聚合import Pipes
import qualified Pipes.Prelude as P
import Data.Time.Clock
import Data.Time.Calendar
import Data.Time.Format
import Data.Monoid
import Control.Monad
-- taken from pipes-rt
doubleToNomDiffTime :: Double -> NominalDiffTime
doubleToNomDiffTime x =
let d0 = ModifiedJulianDay 0
t0 = UTCTime d0 (picosecondsToDiffTime 0)
t1 = UTCTime d0 (picosecondsToDiffTime $ floor (x/1e-12))
in diffUTCTime t1 t0
-- Adapted from from pipes-parse-1.0
wrap
:: Monad m =>
Producer a m r -> Producer (Maybe a) m r
wrap p = do
p >-> P.map Just
forever $ yield Nothing
yieldAggregateOverTime
:: (Monoid y, -- monoid dependance so we can do aggregation
MonadIO m -- to beable to get the current time the
-- base monad must have access to IO
) =>
(t -> y) -- Change element from upstream to monoid
-> Double -- Time in seconds to aggregate over
-> Pipe (Maybe t) y m ()
yieldAggregateOverTime wrap period = do
t0 <- liftIO getCurrentTime
loop mempty (dtUTC `addUTCTime` t0)
where
dtUTC = doubleToNomDiffTime period
loop m ts = do
t <- liftIO getCurrentTime
v0 <- await -- await at least one element
case v0 of
Nothing -> yield m
Just v -> do
if t > ts
then do
yield (m <> wrap v)
loop mempty (dtUTC `addUTCTime` ts)
else do
loop (m <> wrap v) ts
main = do
runEffect $ wrap (each [1..]) >-> yieldAggregateOverTime (\x -> [x]) (0.0001)
>-> P.take 10 >-> P.print
根据cpu加载情况,输出数据的汇总方式会有所不同。每个块中至少有一个元素。
$ ghc Main.hs -O2
$ ./Main
[1,2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
$ ./Main
[1,2]
[3]
[4]
[5]
[6,7,8,9,10]
[11,12,13,14,15,16,17,18]
[19,20,21,22,23,24,25,26]
[27,28,29,30,31,32,33,34]
[35,36,37,38,39,40,41,42]
[43,44,45,46,47,48,49,50]
$ ./Main
[1,2,3,4,5,6]
[7]
[8]
[9,10,11,12,13,14,15,16,17,18,19,20]
[21,22,23,24,25,26,27,28,29,30,31,32,33]
[34,35,36,37,38,39,40,41,42,43,44]
[45,46,47,48,49,50,51,52,53,54,55]
[56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72]
[73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88]
[89,90,91,92,93,94,95,96,97,98,99,100,101,102,103]
$ ./Main
[1,2,3,4,5,6,7]
[8]
[9]
[10,11,12,13,14,15,16,17,18]
[19,20,21,22,23,24,25,26,27]
[28,29,30,31,32,33,34,35,36,37]
[38,39,40,41,42,43,44,45,46]
[47,48,49,50]
[51,52,53,54,55,56,57]
[58,59,60,61,62,63,64,65,66]
你可能想看一下的源代码 pipes-rt它显示了一种处理管道时间的方法。
编辑:感谢DanielDíazCarrete,改进了pipe-parse-1.0技术来处理上游终止。管道组解决方案也应该使用相同的技术。