修订摘要
好吧,看起来系统调用肯定与GC有关,而潜在的问题只是GC经常发生。这似乎与splitWhen和pack的使用有关,我可以通过分析来判断。
splitWhen's implementation将每个块从惰性文本转换为严格文本,并将它们连接起来,因为它构建了一个块缓冲区。这肯定会分配很多。
打包,因为它从一种类型转换为另一种类型,必须分配,并且这是在我的内循环中,所以这也是有道理的。
原始问题
我在基于haskell枚举器的IO中偶然发现了一些令人惊讶的系统调用活动。希望有人可以对此有所了解。
我一直在玩一个快速perl脚本的haskell版本,我曾经写过几个月,开启和关闭。该脚本从每一行读入一些json,然后打印出一个特定的字段(如果存在)。
这是perl版本,以及我如何运行它。
cat ~/sample_input | perl -lpe '($_) = grep(/type/, split(/,/))' > /dev/null
这是haskell版本(它与perl版本类似地调用)。
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Enumerator as E
import qualified Data.Enumerator.Internal as EI
import qualified Data.Enumerator.Text as ET
import qualified Data.Enumerator.List as EL
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Data.Functor
import Control.Monad
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.IO as TLI
import System.Environment
import System.IO (stdin, stdout)
import GHC.IO.Handle (hSetBuffering, BufferMode(BlockBuffering))
fieldEnumerator field = enumStdin E.$= splitOn [',','\n'] E.$= grabField field
enumStdin = ET.enumHandle stdin
splitOn :: [Char] -> EI.Enumeratee T.Text T.Text IO b
splitOn chars = (ET.splitWhen (`elem` chars))
grabField :: String -> EI.Enumeratee T.Text T.Text IO b
grabField = EL.filter . T.isInfixOf . T.pack
intercalateNewlines = EL.mapM_ (\field -> (TI.putStrLn field >> (putStr "\n\n")))
runE enum = E.run_ $ enum E.$$ intercalateNewlines
main = do
(field:_) <- getArgs
runE $ fieldEnumerator field
令人惊讶的是,haskell版本的跟踪看起来像这样(实际的JSON被抑制,因为它是来自工作的数据),而perl版本做了我所期望的;一堆读,然后是写,重复。
55333/0x8816f5: 366125 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 366136 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 367209 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 367218 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 368449 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 368458 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 369525 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 369534 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 370610 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 370620 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 371735 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 371744 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 371798 5 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 371802 3 1 read(0x0, SOME_JSON, 0x1FA0) = 8096 0
55333/0x8816f5: 372907 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 372918 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 374063 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 374072 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 375147 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 375156 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 376283 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 376292 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 376809 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 376814 5 3 read(0x0, SOME_JSON, 0x1FA0) = 8096 0
55333/0x8816f5: 377378 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 377387 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 378537 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 378546 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 379598 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 379604 3 0 sigreturn(0x7FFF5FBFF9A0, 0x1E, 0x1) = 0 Err#-2
55333/0x8816f5: 379613 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 380667 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 380678 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 381862 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 381871 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 382032 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 382036 4 2 read(0x0, SOME_JSON, 0x1FA0) = 8096 0
55333/0x8816f5: 383064 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 383073 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 384118 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 384127 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 385206 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 385215 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 386348 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 386358 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 387468 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 387477 11 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 387614 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 387620 5 3 read(0x0, SOME_JSON, 0x1FA0) = 8096 0
55333/0x8816f5: 388597 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 388606 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 389707 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 389716 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 390261 7 3 select(0x2, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 390273 6 3 write(0x1, SOME_OUTPUT, 0x1FA0) = 8096 0
答案 0 :(得分:7)
从评论中将其提升到最高级别:
FWIW,我正在浏览运行时(我们也在IRC中讨论过这个问题)并且sigprocmask只有两种用途:GC和tty驱动程序。后者不太可能,我建议进行性能分析以验证它是否正在做很多GC并试图找出原因。
事实证明(来自IRC)它正在为0.5MB数据进行90MB的分配,垃圾收集器确实被触发了很多。所以现在它归结为为什么枚举器正在进行如此多的额外分配。
答案 1 :(得分:7)
您是否关注sigprocmask的分配或(开销来自?)调用?
如果是前者并且你想使用enumerator
软件包,这个小的改变会帮助4k测试集大约50%:8MB的分配减少到4MB,gen0 GC从15减少到6。
splitOn :: EI.Enumeratee T.Text T.Text IO b
splitOn = EL.concatMap (T.split fastSplit)
fastSplit :: Char -> Bool
fastSplit ',' = True
fastSplit '\n' = True
fastSplit _ = False
之前(来自+RTS -sstderr -RTS
的统计数据):
8,212,680 bytes allocated in the heap 696,184 bytes copied during GC 148,656 bytes maximum residency (1 sample(s)) 30,664 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 15 colls, 0 par 0.00s 0.00s 0.0001s 0.0005s Gen 1 1 colls, 0 par 0.00s 0.00s 0.0010s 0.0010s
后:
3,838,048 bytes allocated in the heap 689,592 bytes copied during GC 148,368 bytes maximum residency (1 sample(s)) 27,040 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 6 colls, 0 par 0.00s 0.00s 0.0001s 0.0003s Gen 1 1 colls, 0 par 0.00s 0.00s 0.0006s 0.0006s
这是一个非常合理的改进,但肯定会有所不足。我没有在枚举器周围踢得更多,而是只是为了踢,我试图用导管-0.4.1重写它。它应该是等同的......
import Data.Conduit as C
import qualified Data.Conduit.Binary as Cb
import qualified Data.Conduit.List as Cl
import qualified Data.Conduit.Text as Ct
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Control.Monad.Trans (MonadIO, liftIO)
import System.Environment
import System.IO (stdin)
grabField :: Monad m => String -> Conduit T.Text m T.Text
grabField = Cl.filter . T.isInfixOf . T.pack
printField :: MonadIO m => T.Text -> m ()
printField field = liftIO $ do
TI.putStrLn field
putStr "\n\n"
fastSplit :: Char -> Bool
fastSplit ',' = True
fastSplit '\n' = True
fastSplit _ = False
main :: IO ()
main = do
field:_ <- getArgs
runResourceT $ Cb.sourceHandle stdin
$$ Ct.decode Ct.utf8
=$ Cl.concatMap (T.split fastSplit)
=$ grabField field
=$ Cl.mapM_ printField
...但由于某种原因分配并保留较少的内存:
835,688 bytes allocated in the heap 8,576 bytes copied during GC 87,200 bytes maximum residency (1 sample(s)) 19,968 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s Gen 1 1 colls, 0 par 0.00s 0.00s 0.0008s 0.0008s
答案 2 :(得分:4)
如果在这些sigsetmasks之间读取的数据量很大,我首先想到的是运行时在gc运行之前执行sigsetmask,这样gc不会因堆中断而不一致状态。
答案 3 :(得分:3)
不只是评论而不是答案:如果你通过GHC来源,你会看到posix/TTY.c
(TERMIOS代码)和sm/GC.c
(通过{,un}blockUserSignals
)最有可能候选人。您可以使用调试符号编译GHC,或者只是输入一些虚拟(唯一)系统调用,以确保您可以区分两个系统调用配置文件以查找。另一个便宜的测试是删除任何终端交互,如果掩蔽行为消失,那么这将是支持GC的温和证据(没有答案)。
sigprocmask
,我忽略了它作为一个不太可能的来源,但它实际上可能是问题!