Scala如何使用可选属性检索xml标签

时间:2019-04-10 15:38:59

标签: xml scala scala-xml

我正在尝试获取具有属性的scala xml节点标记。我想只获得带有属性的标签名称,而不是子元素。

我有这个输入:

<substance-classes>
    <nucleic-acid-sequence display-name="Nucleic Acid Sequence">
        <nucleic-acid-base>
            <base-symbol>a</base-symbol>
            <count>295</count>
        </nucleic-acid-base>
        <nucleic-acid-base>
            <base-symbol>c</base-symbol>
            <count>329</count>
        </nucleic-acid-base>
        <nucleic-acid-base>
            <base-symbol>g</base-symbol>
            <count>334</count>
        </nucleic-acid-base>
        <nucleic-acid-base>
            <base-symbol>t</base-symbol>
            <count>268</count>
        </nucleic-acid-base>
    </nucleic-acid-sequence>
    <genbank-information>
        <genbank-accession-number>EU186063</genbank-accession-number>
    </genbank-information>
</substance-classes>

我正在尝试通过执行以下操作替换<nucleic-acid-sequence>的内容

val newNucleicAcidSequenceNode = <nucleic-acid-sequence>{ myfunction 
} </nucleic-acid-sequence>

但是有些<nucleic-acid-sequence>具有<nucleic-acid- sequence display-name="Nucleic Acid Sequence">之类的属性。自从我 newNucleicAcidSequenceNode是一个硬编码标签,我正在丢失服装。

如何保留可选属性,仍然将{ myfunction }传递给 <nucleic-acid-sequence>标签?

1 个答案:

答案 0 :(得分:1)

所以,如果我对你的理解很好:

  • 您只想替换XML的一部分
  • 这部分是nucleic-acid-sequence下任何substance-classes的孩子
  • 您不想丢失任何上述nucleic-acid-sequence的属性
  • 通过功能(myFunction)来更改这些前述的孩子

所以我的回答就是这种情况:

import scala.xml.{Node, Elem}

val myXml: Elem =
      <substance-classes>
        <nucleic-acid-sequence display-name="Nucleic Acid Sequence">
          <nucleic-acid-base>
            <base-symbol>a</base-symbol>
            <count>295</count>
          </nucleic-acid-base>
          <nucleic-acid-base>
            <base-symbol>c</base-symbol>
            <count>329</count>
          </nucleic-acid-base>
          <nucleic-acid-base>
            <base-symbol>g</base-symbol>
            <count>334</count>
          </nucleic-acid-base>
          <nucleic-acid-base>
            <base-symbol>t</base-symbol>
            <count>268</count>
          </nucleic-acid-base>
        </nucleic-acid-sequence>
        <genbank-information>
          <genbank-accession-number>EU186063</genbank-accession-number>
        </genbank-information>
      </substance-classes>

def myFunction(children: Seq[Node]) : Seq[Node] = ??? // whatever you want it to be

// Here's the replacement:

myXml.copy(child = myXml.child.map {
  case e@Elem(_, "nucleic-acid-sequence", _, _, children@_*) =>
    e.asInstanceOf[Elem].copy(child = myFunction(children))
  case other => other
})

例如,myFunction只能保留计数超过300的孩子,并且可能类似于:

import scala.util.{ Try, Success }
def myFunction(children: Seq[Node]): Seq[Node] = children.collect {
  case e: Node if Try((e \ "count").text.toInt > 300) == Success(true) =>
  e
}

在这种情况下,如果您以此替换第一个代码段中未实现的myFunction,则替换为:

  <substance-classes>
    <nucleic-acid-sequence display-name="Nucleic Acid Sequence"><nucleic-acid-base>
        <base-symbol>c</base-symbol>
        <count>329</count>
      </nucleic-acid-base><nucleic-acid-base>
        <base-symbol>g</base-symbol>
        <count>334</count>
      </nucleic-acid-base></nucleic-acid-sequence>
    <genbank-information>
      <genbank-accession-number>EU186063</genbank-accession-number>
    </genbank-information>
  </substance-classes>

如您所见,nucleic-acid-sequence的属性没有丢失,并且函数在定义的条件下将两个节点保持在四个以上。

希望有帮助。