Scala:从不同级别的XML中删除节点

时间:2015-01-15 10:18:12

标签: xml scala

我的xml看起来像这样:(它是NodeSeq

<first>...</first>
<second>...</second>
<third>
    <foo>
        <keepattr> ... </keepattr>
        <otherattr1> ... </otherattr1>
    </foo>
    <otherattr2> ... </otherattr2>
</third>

我需要保留<first>,删除<second>及其中的任何内容,并且只在<keepattr>内保留<third>,同时保留数据架构(保留foo标记)

我怎么能在Scala中做到这一点?

我尝试了这个,但是我被迫停留在一个级别

val removeJunk = new RewriteRule {
  override def transform(node: Node): NodeSeq = node match {
    case e: Elem => e.label match {
      case "second" => NodeSeq.Empty
      case "third" => //?
    }
    case o => o

  }
}

我可能有兴趣在计划中降低几个等级

编辑:我希望在不损害数据模型的同时保留数据

<third>
    <foo>
      <keepattr> ... </keepattr> 
      <otherattr1> ... </otherattr1>
    </foo>
    <otherattr2> ... </otherattr2>
</third>

应该成为

<third>
    <foo>
      <keepattr> ... </keepattr> 
    </foo>
</third>

2 个答案:

答案 0 :(得分:2)

您可以使用filterNotRewriteRule的组合。由于在每一步使用\\运算符,这可能效率低下,但我现在无法想到任何其他解决方案:

val input: NodeBuffer = <first>foo</first>
  <second>remove me</second>
  <third>
    <foo>
      <keepattr>meh</keepattr>
      <otherattr1>bar</otherattr1>
    </foo>
    <otherattr2>quux</otherattr2>
  </third>

val extractKeepAttr = new RewriteRule {
  override def transform(node: Node): NodeSeq = node match {
    case e: Elem => e.label match {
      case "keepattr" => e
      case _ if (e \\ "keepattr").nonEmpty => 
        e copy (child = e.child.filter(c => (c \\ "keepattr").nonEmpty) flatMap transform)
      case _ => e
    }
  }
}

// returns <first>foo</first>, <third><foo><keepattr>meh</keepattr></foo></third>
val updatedXml = input.filterNot(_.label == "second").transform(extractKeepAttr)

编辑:更新回答

答案 1 :(得分:0)

我想指出另一个消除了很多复杂性的答案,但并不是那么漂亮......从XML中提取所需的所有信息,将其存储在val中,如果你知道,则手动重建XML结构提前。