我有以下XML:
<tree>
<leaf id="1"/>
<leaf id="1"/>
</tree>
我想要做的是删除重复的<leaf/>
(在整个XML文档中),并将其替换为单个<new-leaf/>
,如下所示:
<tree>
<new-leaf id="1"/>
</tree>
我写过以下RewriteRule
,我认为应该完成这一点(原谅有状态):
import scala.xml._
import scala.xml.transform._
class UniqueLeaves extends RewriteRule {
var leafIds = Set.empty[String]
override def transform(node: Node): Seq[Node] = node match {
case e: Elem if ((e.label == "leaf") && !leafIds.contains((e \\ "@id").text)) => {
leafIds += (e \\ "@id").text
<new-leaf id={(e \\ "@id")} />
}
case e: Elem if (e.label == "leaf") => Seq.empty
case _ => node
}
}
不幸的是,使用RuleTransformer
会给我以下内容:
scala> val tree = <tree><leaf id="1"/><leaf id="1"/></tree>
scala> println(new RuleTransformer(new UniqueLeaves).transform(tree))
<tree/>
我假设这是因为RuleTransformer
calls transform
on the RewriteRule
multiple times,并且正在使用非第一次调用来输出<new-leaf>
节点,该节点在我的匹配中返回空Seq
。
非常感谢有关这项工作的任何提示(并且是非有状态的)。
答案 0 :(得分:2)
对于有类似问题的人,我找到了以下解决方案:
def removeDuplicates(tree: Node): Node = {
var ids = Set.empty[String]
def recurse(node: Node): Seq[Node] = node match {
case e: Elem if (e.label == "leaf") => {
val id = (e \\ "@id").text
ids.contains(id) match {
case true => Seq.empty
case _ => {
ids = ids + id
<new-leaf id={id}/>
}
}
}
case e: Elem => e.copy(child = e.nonEmptyChildren.map(recurse(_).headOption).flatten)
case _ => node
}
recurse(tree).head
}
这是有效的,因为它手动处理节点遍历,而不是使用RuleTransformer#transform
,因此不会多次迭代同一节点(不幸的是,它仍然是有状态的。)