Question

我有一个2.5 MB的文件，里面装满了以空格分隔的浮点数（下面的代码可以为你生成），并希望用attoparsec将其解析为数组。

它出乎意料地缓慢，花了将近一秒钟，并分配了大量内存：

time ./Attoparsec-problem +RTS -sstderr < floats.txt
299999.0
   956,647,344 bytes allocated in the heap
   752,875,520 bytes copied during GC
   166,485,416 bytes maximum residency (7 sample(s))
     8,874,168 bytes maximum slop
           337 MB total memory in use (0 MB lost due to fragmentation)

                                  Tot time (elapsed)  Avg pause  Max pause
Gen  0      1604 colls,     0 par    0.21s    0.27s     0.0002s    0.0092s
Gen  1         7 colls,     0 par    0.24s    0.36s     0.0520s    0.1783s

...

%GC     time      65.5%  (75.1% elapsed)

Alloc rate    3,985,781,488 bytes per MUT second

Productivity  34.5% of total user, 28.6% of total elapsed

我的解析器非常简单：基本上是double <* skipSpace。

这是代码：

import           Control.Applicative
import           Data.Attoparsec.ByteString.Char8 as A
import qualified Data.ByteString as BS
import qualified Data.Vector.Unboxed as U

-- Compile with:
-- ghc --make -O2 -prof -auto-all -caf-all -rtsopts -fforce-recomp Attoparsec-problem.hs

-- Run:
-- time ./Attoparsec-problem +RTS -sstderr < floats.txt

main :: IO ()
main = do

  -- For creating the test file (2.5 MB)
  -- writeFile "floats.txt" (Prelude.unwords $ Prelude.map show [1.0,2.0..300000.0])

  r <- parse parser <$> BS.getContents
  case r of
    Done _ arr -> print $ U.last arr
    x          -> print x

  where
    parser = do
      U.replicateM (300000-1) (double <* skipSpace)


-- This gives surprisingly bad productivity (70% GC time) and 180 MB max residency
-- for a 2.5 MB file!

你能解释一下发生了什么吗？

为什么attoparsec使用的内存比输入文件多100倍？

0 个答案: