在Haskell中对键/值对进行分组时出现空间泄漏

时间:2014-04-19 19:16:35

标签: haskell ghc

我的问题是我的代码创建太多thunk (超过270MB),因此在按键分组值时,GC会花费太多时间(超过70%)。我想知道按键分组值的最佳方法是什么。

问题是我有矢量表示的键和值,我想按键保留顺序对值进行分组。例如:

输入:

keys = 1 2 4 3 1 3 4 2 1 
vals = 1 2 3 4 5 6 7 8 9

输出:

1 = 1,5,9
2 = 2,8
3 = 4,6
4 = 3,7

编译选项:

ghc --make -03 -fllvm histogram.hs

在命令式编程中,我只使用多图,因此我决定使用哈希表,其中关联值为[Int]以存储分组值。我希望有更好的FP解决方案。

{-# LANGUAGE BangPatterns #-}
import qualified Data.HashMap.Strict as M
import qualified Data.Vector.Unboxed as V

n :: Int
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v
 where
    k = V.generate n (\i -> i `mod` 1000)
    v = V.generate n (\i -> i)

ts :: V.Vector (Int,Int) -> M.HashMap Int Int
ts vec =
    V.foldl' (\ht (k, v) -> M.insertWith (+) k v ht) M.empty vec

ts2 :: V.Vector (Int,Int) -> M.HashMap Int [Int]
ts2 vec =
    V.foldl' (\ht (!k, !v) -> M.insertWith (++) k [v] ht) M.empty vec

main :: IO ()
main = ts2 kv `seq` putStrLn "done"

这是在运行时吐出的东西:

   3,117,102,992 bytes allocated in the heap
   1,847,205,880 bytes copied during GC
     324,159,752 bytes maximum residency (12 sample(s))
       6,502,224 bytes maximum slop
             658 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5991 colls,     0 par    0.58s    0.58s     0.0001s    0.0003s
  Gen  1        12 colls,     0 par    0.69s    0.69s     0.0577s    0.3070s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.45s  (  0.45s elapsed)
  GC      time    1.27s  (  1.27s elapsed)
  EXIT    time    0.03s  (  0.03s elapsed)
  Total   time    1.75s  (  1.75s elapsed)

  %GC     time      72.7%  (72.8% elapsed)

  Alloc rate    6,933,912,935 bytes per MUT second

  Productivity  27.3% of total user, 27.3% of total elapsed

你可以看到它花了很多时间在GC上,所以我决定使用刘海来严格控制列表连接。我想 ++ 也非常昂贵,但不知道解决方法。

3 个答案:

答案 0 :(得分:3)

那些严格的注释是无用的。他们只强制列表的第一个构造函数。

更糟糕的是,您似乎试图离开折叠(++),这绝不是一个好主意。它导致大量无用的中间列表复制,即使它完全严格。

您应该折叠为[Int] -> [Int]值。这将摆脱多个无用的分配。我在移动设备上,所以我无法真正提供完整的示例代码。主要思想是将循环更改为M.insertWith (.) k (v:),然后在折叠后将($ [] )映射到HashMap中的值。

答案 1 :(得分:2)

我尝试在我的主机上运行您的代码,但我无法重现您的个人资料:

runhaskell test8.hs +RTS -sstderr
done
     120,112 bytes allocated in the heap
       3,520 bytes copied during GC
      68,968 bytes maximum residency (1 sample(s))
      12,952 bytes maximum slop
           1 MB total memory in use (0 MB lost due to fragmentation)

                                Tot time (elapsed)  Avg pause  Max pause
 Gen  0         0 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
 Gen  1         1 colls,     0 par    0.00s    0.09s     0.0909s    0.0909s

 INIT    time    0.00s  (  0.01s elapsed)
 MUT     time    0.00s  ( 29.21s elapsed)
 GC      time    0.00s  (  0.09s elapsed)
 EXIT    time    0.00s  (  0.09s elapsed)
 Total   time    0.01s  ( 29.40s elapsed)

 %GC     time       5.7%  (0.3% elapsed)

 Alloc rate    381,307,936 bytes per MUT second

 Productivity  91.1% of total user, 0.0% of total elapsed

您能否详细介绍一下如何测试代码?如果你正在使用ghci那么

$ ghci -fobject-code

我们可能需要使用-fobject-code来消除ghci中的任何空间泄漏。如果您已经尝试过ghci选项,假设您正在使用ghci,我将编辑我的答案。此时,我想重现您所看到的问题。

更新: @ duplode:谢谢你的指点。我将删除之前的输出,没有人反对它,因为它有误导性。

我已经能够使用以下选项之一稍微减少gc开销。我得到了一些好处,但开销仍然在49-50%的范围内:

ts3 :: V.Vector (Int, Int) -> M.HashMap Int [Int]
ts3 vec =
V.foldl (\ht (!k, !v) -> 
    let
        element = M.lookup k ht in
    case element of
        Nothing -> M.insert k [v] ht
        Just aList -> M.insert k (v:aList) ht) M.empty vec
ts4 :: V.Vector (Int,Int) -> M.HashMap Int [Int]
ts4 vec = 
    let initMap = V.foldl (\ht (!k,_) -> M.insert k [] ht) M.empty vec
in
    V.foldl (\ht (!k, !v) -> M.adjust(\x -> v:x) k ht) initMap vec

调整看起来好一点,但结果看起来与直接查找类似。用ts4使用adjust:

calling ts4 done.
 3,838,059,320 bytes allocated in the heap
 2,041,603,344 bytes copied during GC
 377,412,728 bytes maximum residency (6 sample(s))
   7,725,944 bytes maximum slop
         737 MB total memory in use (0 MB lost due to fragmentation)

                                Tot time (elapsed)  Avg pause  Max pause
  Gen  0      7260 colls,     0 par    1.32s    1.45s     0.0002s    0.0013s
  Gen  1         6 colls,     0 par    0.88s    1.40s     0.2328s    0.9236s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.18s  (  2.21s elapsed)
  GC      time    2.19s  (  2.85s elapsed)
  RP      time    0.00s  (  0.00s elapsed)
  PROF    time    0.00s  (  0.00s elapsed)
  EXIT    time    0.01s  (  0.07s elapsed)
  Total   time    4.38s  (  5.13s elapsed)

  %GC     time      50.0%  (55.5% elapsed)

  Alloc rate    1,757,267,879 bytes per MUT second

 Productivity  50.0% of total user, 42.7% of total elapsed

使用简单的查找/更新(更新地图的命令式)

calling ts3 done.
3,677,137,816 bytes allocated in the heap
2,040,053,712 bytes copied during GC
 395,867,512 bytes maximum residency (6 sample(s))
   7,326,104 bytes maximum slop
         769 MB total memory in use (0 MB lost due to fragmentation)

                                Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6999 colls,     0 par    1.35s    1.51s     0.0002s    0.0037s
  Gen  1         6 colls,     0 par    1.06s    2.16s     0.3601s    1.3175s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.89s  (  2.07s elapsed)
  GC      time    2.41s  (  3.67s elapsed)
  RP      time    0.00s  (  0.00s elapsed)
  PROF    time    0.00s  (  0.00s elapsed)
  EXIT    time    0.01s  (  0.08s elapsed)
  Total   time    4.31s  (  5.82s elapsed)

  %GC     time      55.9%  (63.0% elapsed)

  Alloc rate    1,942,816,558 bytes per MUT second

  Productivity  44.1% of total user, 32.6% of total elapsed

我有兴趣了解如何减少查询时间,如下面的配置文件输出所示:

COST CENTRE   MODULE  %time %alloc

ts3.\         Main     54.1   91.4
ts3.\.element Main     19.0    2.9
ts3           Main     11.0    2.9
kv.k          Main      6.5    1.4   
kv.v          Main      5.2    1.4
kv.k.\        Main      4.0    0.0


                                                         individual     inherited
COST CENTRE        MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN               MAIN                     72           0    0.0    0.0   100.0  100.0
 main              Main                    158           0    0.0    0.0     0.0    0.0
 CAF:main          Main                    143           0    0.0    0.0    84.2   97.1
  main             Main                    144           1    0.0    0.0    84.2   97.1
   ts3             Main                    145           1   11.0    2.9    84.2   97.1
    ts3.\          Main                    156     5000000   54.1   91.4    73.2   94.3
     ts3.\.element Main                    157     5000000   19.0    2.9    19.0    2.9
 CAF:kv            Main                    142           0    0.0    0.0     0.0    0.0

代码

    -- ghc -O2 --make test8.hs -prof -auto-all -caf-all -fforce-recomp +RTS
    -- ./test8 +RTS -p

    {-# LANGUAGE BangPatterns #-}
    import qualified Data.HashMap.Strict as M
    import qualified Data.Vector.Unboxed as V

    n :: Int
    n = 5000000

    kv :: V.Vector (Int,Int)
    kv = V.zip (k) (v)
     where
        k = V.generate n (\i -> i `mod` 1000)
        v = V.generate n (\i -> i)

    ts :: V.Vector (Int,Int) -> M.HashMap Int Int
    ts vec =
        V.foldl' (\ht (k, v) -> M.insertWith (+) k v ht) M.empty vec



    ts2 :: V.Vector (Int,Int) -> M.HashMap Int [Int]
    ts2 vec =
        V.foldl (\ht (!k, !v) -> M.insertWith (++) k [v] ht) M.empty vec

    ts3 :: V.Vector (Int, Int) -> M.HashMap Int [Int]
    ts3 vec =
        V.foldl (\ht (!k, !v) -> 
            let
                element = M.lookup k ht in
            case element of
                Nothing -> M.insert k [v] ht
                Just aList -> M.insert k (v:aList) ht) M.empty vec
    ts4 :: V.Vector (Int,Int) -> M.HashMap Int [Int]
    ts4 vec = 
            let initMap = V.foldl (\ht (!k,_) -> M.insert k [] ht) M.empty vec
        in
            V.foldl (\ht (!k, !v) -> M.adjust(\x -> v:x) k ht) initMap vec


    main :: IO ()
    main = ts3 kv `seq` putStrLn "calling ts3 done."

    main1 = do
                if x == y then
                    putStrLn "Algos Match"
                else
                    putStrLn "Error"
            where
                x = ts2 kv
                y = ts4 kv

答案 2 :(得分:1)

你的大部分问题都归因于(++)导致大量无用的复制中间列表"正如Carl在答案中所说的那样。在替换(++)时使用了一些不同的方法,到目前为止,我从Data.IntMap.Strict切换到containers得到了最好的结果(只是为了利用较不严格的API - 我不会&# 39;知道哪个实现更高效本身)并使用其alter函数预先添加向量元素而不创建单例列表:

import qualified Data.IntMap.Strict as M
import qualified Data.Vector.Unboxed as V

n :: Int
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v
 where
    k = V.generate n (\i -> i `mod` 1000)
    v = V.generate n (\i -> i)

ts2 :: V.Vector (Int,Int) -> M.IntMap [Int]
ts2 vec =
    V.foldl' (\ht (k, v) -> M.alter (prep v) k ht) M.empty vec
    where
    prep x = Just . maybe [x] (x:)

main :: IO ()
main = print $ M.foldl' (+) 0 $ M.map length $ ts2 kv

第二个最佳解决方案是使用

\ht (k, v) -> M.insertWith (\(x:_) -> (x :)) k [v] ht

作为折叠运算符。这适用于Data.IntMap.StrictData.HashMap.Strict,结果表现相似。

N.B。:请注意,在所有情况下,包括原始实现,向量元素被预先添加到列表中,而不是附加到列表中。如果要附加元素,则问题会更加严重,因为重复附加到(++)的空列表是元素数量的二次方。