使用Text.Generic.Diff生成包含“已插入” /“已删除”标记的输出

时间:2019-03-26 10:35:18

标签: haskell tree diff

我正在使用Haskell gdiff包来计算树之间的差异。 diff算法的输出是一个“编辑脚本”,它描述了将“之前”树转换为“之后”树的一系列操作。 gdiff提供了一个“补丁”功能,该功能可将编辑脚本应用于“前”树,从而生成“后”树。

我需要做的是修改此修补程序操作,以便输出为“后”树,其中突出显示了修改。

例如,假设树是文档AST。我想生成一个输出,该输出在“之后”文档中内联显示插入/删除。

到目前为止,我已经编写了一个程序,该程序成功使用gdiff计算简单的二叉树数据结构实例之间的差异。我不知道如何修改结果编辑脚本,以便在执行补丁操作时注入“插入”和“删除”标记。

有人可以帮忙吗?

区分两个二叉树

这是我的二叉树数据结构:

data Tree = Node String Tree Tree
          | Empty
          deriving Show

这是我的示例“前”和“后”树:

before :: Tree
before =
  Node "root"
    (Node "A"
      (Empty)
      (Empty)
    )
    (Empty)

after :: Tree
after =
  Node "root"
    (Node "A"
      (Node "B" Empty Empty)
      (Empty)
    )
    (Empty)

差异执行如下:

runDiff :: EditScript TreeFamily Tree Tree
runDiff = diff before after

main :: IO ()
main = do
  putStrLn ("before     = " ++ (show before))
  putStrLn ("after      = " ++ (show after))

  let edit = runDiff
  putStrLn ("edit       = " ++ (show edit))

  let compressed = compress edit
  putStrLn ("compressed = " ++ (show compressed))

  let result = patch edit before
  putStrLn ("result     = " ++ (show result))

(稍后我将回到TreeFamily的定义。)

输出为:

before     = Node "root" (Node "A" Empty Empty) Empty
after      = Node "root" (Node "A" (Node "B" Empty Empty) Empty) Empty
edit       = Cpy Node $ Cpy "root" $ Cpy Node $ Cpy "A" $ Ins Node $ Ins "B" $ Cpy Empty $ Cpy Empty $ Cpy Empty $ Ins Empty $ End
compressed = Cpy Node $ CpyTree $ Cpy Node $ CpyTree $ Ins Node $ Ins "B" $ CpyTree $ CpyTree $ CpyTree $ Ins Empty $ End
result     = Node "root" (Node "A" (Node "B" Empty Empty) Empty) Empty

建议的策略:处理编辑脚本

我认为我可以通过处理编辑脚本来实现“在树后生成标记”操作,以便将... $ Ins Node $ ...替换为... $ Ins InsNode $ ...,其中InsNode是另一个{{ 1}}构造函数:

Tree

(和删除类似,但这篇文章仅侧重于插入。)

处理后的编辑脚本随后将被馈送到现有的gdiff补丁函数中。

TreeFamily定义

gdiff库要求用户定义“家庭数据类型”。这是我的定义。请注意,我已经包含了data Tree = Node String Tree Tree | InsNode String Tree Tree | Empty deriving Show 类型。尽管这不会出现在输入数据中,但我认为 gdiff需要了解它才能执行上述的InsNodeNode替换。

InsNode

首次尝试使用processEdit函数

处理data TreeFamily :: * -> * -> * where Node' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil))) InsNode' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil))) String' :: String -> TreeFamily String Nil Empty' :: TreeFamily Tree Nil instance Family TreeFamily where decEq Node' Node' = Just(Refl, Refl) decEq InsNode' InsNode' = Just(Refl, Refl) decEq (String' s1) (String' s2) | s1 == s2 = Just (Refl, Refl) | otherwise = Nothing decEq Empty' Empty' = Just(Refl, Refl) decEq _ _ = Nothing fields Node' (Node s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil))) fields InsNode' (InsNode s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil))) fields (String' _) _ = Just CNil fields Empty' Empty = Just CNil fields _ _ = Nothing apply Node' (CCons s (CCons t1 (CCons t2 CNil))) = Node s t1 t2 apply InsNode' (CCons s (CCons t1 (CCons t2 CNil))) = InsNode s t1 t2 apply (String' s) CNil = s apply Empty' CNil = Empty string Node' = "Node" string InsNode' = "InsNode" string (String' s) = show s string Empty' = "Empty" instance Type TreeFamily Tree where constructors = [ Concr Node', Concr InsNode', Concr Empty' ] instance Type TreeFamily String where constructors = [ Abstr String' ] 以执行从EditScriptNode的替换的函数应具有与InsNode函数相同的签名,即:

compress

我可以写出以下恒等式...

processEdit :: (Family f) => EditScriptL f txs tys -> EditScriptL f txs tys

...但是我不知道如何修改processEdit End = End processEdit (Ins c d) = Ins c (processEdit d) processEdit (Del c d) = Del c (processEdit d) processEdit (CpyTree d) = CpyTree (processEdit d) processEdit (Cpy c d) = Cpy c (processEdit d) 公式来执行替换。有人可以帮忙吗?

完整的测试程序供参考

Ins

1 个答案:

答案 0 :(得分:0)

只需将processEdit专用于TreeFamily(因为显然您要完成的工作特定于TreeFamily),并在{{1 }}。

Ins

但是,我不喜欢这种方法。它需要修改原始数据类型,并且您将失去“原始”树和“修补”树之间的类型级别区别。更好的解决方案是创建另一个数据类型(例如processEdit :: EditScriptL TreeFamily txs tys -> EditScriptL TreeFamily txs tys processEdit End = End processEdit (Ins Node' d) = Ins InsNode' (processEdit d) processEdit (Ins c d) = Ins c (processEdit d) processEdit (Del c d) = Del c (processEdit d) processEdit (CpyTree d) = CpyTree (processEdit d) processEdit (Cpy c d) = Cpy c (processEdit d) )并重新实现ChangedTree。如果您同时跟踪插入和删除操作,是否还需要“替换”类型的更改?

哦,patch' :: EditScriptL TreeFamily Tree Tree -> Tree -> ChangedTree需要类型签名,因为否则它不知道要使用哪个runDiff实例。例如。 Type _ Tree(TypeApplications扩展名)将解决此问题。 Haskell的类型类是开放的,因此它不会自动推断出您想要diff @TreeFamily before after而不是其他instance Type TreeFamily Tree,只是因为它现在无法在范围内看到任何其他合适的instance Type XXX Tree并不意味着它将猜测您打算使用的是什么。