Haskell Data.Memocombinators性能问题?

时间:2015-05-19 07:35:40

标签: performance haskell profiling ghc memoization

_

嗨,那里,

my program to compute differences between files的一部分利用标准DP算法计算两个列表之间最长的常见非连续子序列。我一直在使用某些功能遇到性能问题,因此我将HPC运行到配置文件,并找到了以下结果:

                                                individual     inherited
COST CENTRE                       no. entries  %time %alloc   %time %alloc
(ommitted lines above)
longestCommonSubsequence             1          0.0    0.0    99.9  100.0
 longestCommonSubsequence'           8855742   94.5   98.4    99.9  100.0
  longestCommonSubsequence''         8855742    4.2    0.8     5.4    1.6
   longestCommonSubsequence''.caseY  3707851    0.6    0.6     0.6    0.6
   longestCommonSubsequence''.caseX  3707851    0.6    0.2     0.6    0.2
(ommitted lines below)

以下是有问题的代码:

longestCommonSubsequence' :: forall a. (Eq a) => [a] -> [a] -> Int -> Int -> [a]
longestCommonSubsequence' xs ys i j =
      (Memo.memo2 Memo.integral Memo.integral (longestCommonSubsequence'' xs ys)) i j

longestCommonSubsequence'' :: forall a. (Eq a) => [a] -> [a] -> Int -> Int -> [a]
longestCommonSubsequence'' [] _ _ _ = []
longestCommonSubsequence'' _ [] _ _ = []
longestCommonSubsequence'' (x:xs) (y:ys) i j =
    if x == y
        then x : (longestCommonSubsequence' xs ys (i + 1) (j + 1)) -- WLOG
        else if (length caseX) > (length caseY)
            then caseX
            else caseY
    where
        caseX :: [a]
        caseX = longestCommonSubsequence' xs (y:ys) (i + 1) j

        caseY :: [a]
        caseY = longestCommonSubsequence' (x:xs) ys i (j + 1)

我发现值得注意的是,所有时间和内存使用都发生在longestCommonSubsequence',即memoizing包装器中。因此,我会得出结论,性能影响来自Data.Memocombinators完成的所有查找和缓存,尽管它在我使用它的许多其他时间总是令人钦佩地表现出来。

我想我的问题是......这个结论似乎是合理的;是吗?如果是这样,那么有没有人对其他方法有任何建议来实现DP?

作为参考,将两个14行长文件与相应内容"a\nb\nc\n...m""*a\nb\nc\n...m*"(相同内容但'*'预先进行比较需要12秒 - 这是荒谬的长悬挂和后悬挂。)

提前致谢! :)

编辑:现在尝试ghc-core个东西;如果我可以使用Cabal项目很好地发挥它并获得任何有用的信息,我会发布更新!

1 个答案:

答案 0 :(得分:1)

当您致电Memo.memo2 Memo.integral Memo.integral (longestCommonSubsequence'' xs ys)时,会为功能longestCommonSubsequence'' xs ys创建一个记事本。这意味着每个xsys的不同值都有一个记事本。我想大部分执行时间花在为所有这些备忘录创建所有这些数据结构上。

您是否想要记住longestCommonSubsequence''的4个论点?