_
嗨,那里,
my program to compute differences between files的一部分利用标准DP算法计算两个列表之间最长的常见非连续子序列。我一直在使用某些功能遇到性能问题,因此我将HPC运行到配置文件,并找到了以下结果:
individual inherited
COST CENTRE no. entries %time %alloc %time %alloc
(ommitted lines above)
longestCommonSubsequence 1 0.0 0.0 99.9 100.0
longestCommonSubsequence' 8855742 94.5 98.4 99.9 100.0
longestCommonSubsequence'' 8855742 4.2 0.8 5.4 1.6
longestCommonSubsequence''.caseY 3707851 0.6 0.6 0.6 0.6
longestCommonSubsequence''.caseX 3707851 0.6 0.2 0.6 0.2
(ommitted lines below)
以下是有问题的代码:
longestCommonSubsequence' :: forall a. (Eq a) => [a] -> [a] -> Int -> Int -> [a]
longestCommonSubsequence' xs ys i j =
(Memo.memo2 Memo.integral Memo.integral (longestCommonSubsequence'' xs ys)) i j
longestCommonSubsequence'' :: forall a. (Eq a) => [a] -> [a] -> Int -> Int -> [a]
longestCommonSubsequence'' [] _ _ _ = []
longestCommonSubsequence'' _ [] _ _ = []
longestCommonSubsequence'' (x:xs) (y:ys) i j =
if x == y
then x : (longestCommonSubsequence' xs ys (i + 1) (j + 1)) -- WLOG
else if (length caseX) > (length caseY)
then caseX
else caseY
where
caseX :: [a]
caseX = longestCommonSubsequence' xs (y:ys) (i + 1) j
caseY :: [a]
caseY = longestCommonSubsequence' (x:xs) ys i (j + 1)
我发现值得注意的是,所有时间和内存使用都发生在longestCommonSubsequence'
,即memoizing包装器中。因此,我会得出结论,性能影响来自Data.Memocombinators
完成的所有查找和缓存,尽管它在我使用它的许多其他时间总是令人钦佩地表现出来。
我想我的问题是......这个结论似乎是合理的;是吗?如果是这样,那么有没有人对其他方法有任何建议来实现DP?
作为参考,将两个14行长文件与相应内容"a\nb\nc\n...m"
和"*a\nb\nc\n...m*"
(相同内容但'*'
预先进行比较需要12秒 - 这是荒谬的长悬挂和后悬挂。)
提前致谢! :)
编辑:现在尝试ghc-core
个东西;如果我可以使用Cabal项目很好地发挥它并获得任何有用的信息,我会发布更新!
答案 0 :(得分:1)
当您致电Memo.memo2 Memo.integral Memo.integral (longestCommonSubsequence'' xs ys)
时,会为功能longestCommonSubsequence'' xs ys
创建一个记事本。这意味着每个xs
和ys
的不同值都有一个记事本。我想大部分执行时间花在为所有这些备忘录创建所有这些数据结构上。
您是否想要记住longestCommonSubsequence''
的4个论点?