如何用Java中的混合节点和文本解析XML?

时间:2018-03-08 21:57:57

标签: java xml xpath xml-parsing

我有一个格式的xml -

<root>
      <sentence>
           first part of the text 

           <a id="interpolation_1"> </a>

           second part of the text

           <a id="interpolation_2"> </a>
      </sentence>
</root>

基本上,<sentence>标记代表一个句子,子标记<a>是句子中的插值部分。

XPath表达式String sentence = xPath.evaluate("sentence", transUnitElement);将文本设为 - first part of the text second part of the text,即省略插值。

XPath表达式 -

NodeList aList = (NodeList) xPath.evaluate("/sentence/a", transUnitElement, XPathConstants.NODESET);提供了<a>元素的列表。

如何解析它们以获取<sentence>元素以及<a>元素的文本而不会丢失<a>元素的顺序和位置?

预期产量 - the first part of the sentence {interpolation_1} second part of the text {interpolation_2}

2 个答案:

答案 0 :(得分:1)

您正在寻找的结果可以通过迭代sentence的子节点并逐步构建目标字符串来实现。例如:

// retrieve <sentence> as Node, not as text
Node sentence = (Node) xPath.evaluate("sentence", transUnitElement, XPathConstants.NODE);

StringBuilder resultBuilder = new StringBuilder();
NodeList children = sentence.getChildNodes();

for (int i = 0; i < children.getLength(); i++) {
  Node child = children.item(i);
  short nodeType = child.getNodeType();
  switch (nodeType) {
    case Node.TEXT_NODE:
      String text = child.getTextContent().trim();
      resultBuilder.append(text);
      break;
    case Node.ELEMENT_NODE:
      String id = ((Element) child).getAttribute("id");
      resultBuilder.append(" {").append(id).append("} ");
      break;
    default:
      throw new IllegalStateException("Unexpected node type: " + nodeType);
  }
}
// outputs "first part of the text {interpolation_1} second part of the text {interpolation_2}"
System.out.println(resultBuilder.toString());

答案 1 :(得分:1)

您是否考虑过使用XSLT转换进行此操作?在XSLT 3.0中,它只是

<xsl:template match="sentence">
  <xsl:apply-templates/>
</xsl:template>
<xsl:template match="a">{<xsl:value-of select="."}</xsl:template>