消除重复,使用scala.xml.transform.RuleTransformer更改标签

时间:2014-01-27 21:22:02

标签: xml scala

我有以下XML:

<tree>
  <leaf id="1"/>
  <leaf id="1"/>
</tree>

我想要做的是删除重复的<leaf/>(在整个XML文档中),并将其替换为单个<new-leaf/>,如下所示:

<tree>
  <new-leaf id="1"/>
</tree>

我写过以下RewriteRule,我认为应该完成这一点(原谅有状态):

import scala.xml._
import scala.xml.transform._

class UniqueLeaves extends RewriteRule {

  var leafIds = Set.empty[String]

  override def transform(node: Node): Seq[Node] = node match {
    case e: Elem if ((e.label == "leaf") && !leafIds.contains((e \\ "@id").text)) => {
      leafIds += (e \\ "@id").text
      <new-leaf id={(e \\ "@id")} />
    }
    case e: Elem if (e.label == "leaf") => Seq.empty
    case _ => node
  }

}

不幸的是,使用RuleTransformer会给我以下内容:

scala> val tree = <tree><leaf id="1"/><leaf id="1"/></tree>
scala> println(new RuleTransformer(new UniqueLeaves).transform(tree))
<tree/>

我假设这是因为RuleTransformer calls transform on the RewriteRule multiple times,并且正在使用非第一次调用来输出<new-leaf>节点,该节点在我的匹配中返回空Seq

非常感谢有关这项工作的任何提示(并且是非有状态的)。

1 个答案:

答案 0 :(得分:2)

对于有类似问题的人,我找到了以下解决方案:

def removeDuplicates(tree: Node): Node = {
  var ids = Set.empty[String]
  def recurse(node: Node): Seq[Node] = node match {
    case e: Elem if (e.label == "leaf") => {
      val id = (e \\ "@id").text
      ids.contains(id) match {
        case true => Seq.empty
        case _ => {
          ids = ids + id
          <new-leaf id={id}/>
        }
      }
    }
    case e: Elem => e.copy(child = e.nonEmptyChildren.map(recurse(_).headOption).flatten)
    case _ => node
  }
  recurse(tree).head
}

这是有效的,因为它手动处理节点遍历,而不是使用RuleTransformer#transform,因此不会多次迭代同一节点(不幸的是,它仍然是有状态的。)