Question

fgl是用于图形操作的Haskell库。这个库带有一个基类的实现 - Data.Graph.Inductive.PatriciaTree - 据说可以高度调整性能。性能调优的一部分涉及ghc RULES编译指示，用更快的专用版本替换某些通用函数。

然而，我的证据是这些规则似乎根本不起作用，我不明白为什么不这样做。对于试图完全复制我看到的内容的人，我已将我的测试项目放在https://github.com/fizbin/GraphOptiTest并使用ghc版本7.10.2。

这是我的测试程序：

{-# LANGUAGE TupleSections #-}

module Main where

import Control.Exception
import Control.Monad
import Data.Graph.Inductive
import qualified Data.Graph.Inductive.PatriciaTree as Pt
import qualified MyPatriciaTree as MPt

makeGraph :: (DynGraph gr) => Int -> gr () Int
makeGraph n = mkGraph (map (,()) [1 .. n])
  (concatMap (\x -> map (\y -> (x, y, x*y)) [x .. n]) [1 .. n])

main1 :: IO ()
main1 =
  replicateM_ 200 $ let x = makeGraph 200 :: Pt.Gr () Int
                    in evaluate (length $ show x)

main2 :: IO ()
main2 =
  replicateM_ 200 $ let x = makeGraph 200 :: MPt.Gr () Int
                    in evaluate (length $ show x)

main :: IO ()
main = main1 >> main2

现在，Data.Graph.Inductive.PatriciaTree具有类函数mkGraph的定义：

    mkGraph vs es   = insEdges es
                      . Gr
                      . IM.fromList
                      . map (second (\l -> (IM.empty,l,IM.empty)))
                      $ vs

其中insEdges是模块Data.Graph.Inductive.Graph中定义的函数：

insEdges :: (DynGraph gr) => [LEdge b] -> gr a b -> gr a b
insEdges es g = foldl' (flip insEdge) g es

而且Data.Graph.Inductive.PatriciaTree对此有insEdge：

{-# RULES
      "insEdge/Data.Graph.Inductive.PatriciaTree"  insEdge = fastInsEdge
  #-}
fastInsEdge :: LEdge b -> Gr a b -> Gr a b
fastInsEdge (v, w, l) (Gr g) = g2 `seq` Gr g2
  where
    g1 = IM.adjust addSucc' v g
    g2 = IM.adjust addPred' w g1

    addSucc' (ps, l', ss) = (ps, l', IM.insertWith addLists w [l] ss)
    addPred' (ps, l', ss) = (IM.insertWith addLists v [l] ps, l', ss)

所以，理论上，当我在测试程序中运行main1时，我应该将其编译成最终调用fastInsEdge的内容。

为了对此进行测试，我与Data.Graph.Inductive.PatriciaTree的修改版本进行了比较，后者使用此作为mkGraph方法的定义:(这是上面{{1}中使用的类MyPatriciaTree }}）

main2

当我运行我的测试程序（mkGraph vs es = doInsEdges . Gr . IM.fromList . map (second (\l -> (IM.empty,l,IM.empty))) $ vs where doInsEdges g = foldl' (flip fastInsEdge) g es和cabal configure --enable-library-profiling --enable-executable-profiling之后）时，cabal build GraphOptiTest方法会抽取main2方法。它甚至没有关闭 - 该配置文件显示该计划的99.2％的时间花在main1内。（并将程序更改为只运行main1表明是的，main2本身真的很快）

是的，我的cabal文件的main2部分中有-O。

尝试像ghc-options这样的ghc选项并没有什么帮助 - 我只能看到这些替换规则没有解决，但我不明白为什么。我不知道如何让编译器告诉我为什么它没有激活替换规则。

通过弄乱-ddump-rule-firings的来源，发现一些被发现的东西，以回应@dfeuer的答案：

如果我将fgl的专用版本添加到insEdges：

Data.Graph.Inductive.PatriciaTree

然后{-# RULES "insEdges/Data.Graph.Inductive.PatriciaTree" insEdges = fastInsEdges #-} fastInsEdges :: [LEdge b] -> Gr a b -> Gr a b fastInsEdges es g = foldl' (flip fastInsEdge) g es和main1现在都很快。此替换规则触发;为什么不是另一个？（不，告诉ghc main2函数NOINLINE没有好处）

EPILOGUE：

因此，现在存在一个与insEdge包一起提交的错误，该错误未标记其使用fgl和insEdge的函数，以便使用快速版本。但是在我的代码中我现在解决这个问题，并且在更多情况下解决方法可能会有用，所以我想我会分享它。在我的代码的顶部，我有：

insNode

（如果我在我的代码中使用了import qualified Data.Graph.Inductive as G import qualified Data.Graph.Inductive.PatriciaTree as Pt -- Work around design and implementation performance issues -- in the Data.Graph.Inductive package. -- Specifically, the tuned versions of insNode, insEdge, gmap, nmap, and emap -- for PatriciaTree graphs are exposed only through RULES pragmas, meaning -- that you only get them when the compiler can specialize the function -- to that specific instance of G.DynGraph. Therefore, I create my own -- type class with the functions that have specialized versions and use that -- type class here; the compiler then can do the specialized RULES -- replacement on the Pt.Gr instance of my class. class (G.DynGraph gr) => MyDynGraph gr where mkGraph :: [G.LNode a] -> [G.LEdge b] -> gr a b insNodes :: [G.LNode a] -> gr a b -> gr a b insEdges :: [G.LEdge b] -> gr a b -> gr a b insNode :: G.LNode a -> gr a b -> gr a b insEdge :: G.LEdge b -> gr a b -> gr a b gmap :: (G.Context a b -> G.Context c d) -> gr a b -> gr c d nmap :: (a -> c) -> gr a b -> gr c b emap :: (b -> c) -> gr a b -> gr a c instance MyDynGraph Pt.Gr where mkGraph nodes edges = insEdges edges $ G.mkGraph nodes [] insNodes vs g = foldl' (flip G.insNode) g vs insEdges es g = foldl' (flip G.insEdge) g es insNode = G.insNode insEdge = G.insEdge gmap = G.gmap nmap = G.nmap emap = G.emap函数，我也会将它包含在类中）然后，我以前用nemap编写的任何代码现在用术语编写(G.DynGraph gr) => ...。编译器RULES激活(MyDynGraph gr) => ...实例，然后我获得每个函数的优化版本。

基本上，这会削弱编译器将任何这些函数内联到调用代码中的能力，并可能进行其他优化以始终获得优化版本。（以及在运行时额外指针间接的成本，但相比之下这是微不足道的）因为分析表明那些其他优化无论如何都没有产生任何重要意义，这在我的案例中是一个明显的净胜利。

许多人的代码可以积极地使用Pt.Gr规则来获得各地的优化版本;但是，有时这是不可能的，并且如果没有重构应用程序的大块，那么实际的生产代码就不会导致我的问题。我有一个数据结构，其成员类型为SPECIALIZE - 现在使用(forall gr. G.DynGraph gr => tokType -> gr () (MyEdge c))作为类约束，但完全展开它以使签名中没有MyDynGraph将是巨大的努力，这样的签名阻止专业化跨越边界。

Answer 1

我还没有做过任何实验，但这是我的猜测。 insEdge函数未标记为（已定相）INLINE或NOINLINE，因此只要内核完全应用，内联器就可以自由内联。在insEdges的定义中，我们看到了

foldl' (flip insEdge) g es

内联foldl'给出了

foldr f' id es g
  where f' x k z = k $! flip insEdge z x

flip现已完全应用，因此我们可以内联它：

foldr f' id es g
  where f' x k z = k $! insEdge x z

现在insEdge已完全应用，因此GHC可能会选择在规则有机会之前在那里内联它。

尝试按{-# NOINLINE [0] insEdge #-}的定义添加insEdge，看看会发生什么。如果有效，请向fgl提交拉取请求。

P.S。在我看来，这种事情应该通过使用默认的类方法来完成，而不是重写规则。规则总是有点挑剔。

正如评论所揭示的那样，最大的问题不是过早的内联，而是未能专攻insEdge。特别是，Data.Graph.Inductive.Graph不会导出insEdges的展开，因此无法将其专门化，并且它调用insEdge到适当的类型。最终的解决方案是标记insEdges INLINABLE，但我仍然建议您在谨慎的情况下标记insEdge NOINLINE [0]。

图书馆ghc规则不会激活

1 个答案: