我有一个格式的xml -
<root>
<sentence>
first part of the text
<a id="interpolation_1"> </a>
second part of the text
<a id="interpolation_2"> </a>
</sentence>
</root>
基本上,<sentence>
标记代表一个句子,子标记<a>
是句子中的插值部分。
XPath表达式String sentence = xPath.evaluate("sentence", transUnitElement);
将文本设为 - first part of the text second part of the text
,即省略插值。
XPath表达式 -
NodeList aList = (NodeList) xPath.evaluate("/sentence/a", transUnitElement, XPathConstants.NODESET);
提供了<a>
元素的列表。
如何解析它们以获取<sentence>
元素以及<a>
元素的文本而不会丢失<a>
元素的顺序和位置?
预期产量 -
the first part of the sentence {interpolation_1} second part of the text {interpolation_2}
答案 0 :(得分:1)
您正在寻找的结果可以通过迭代sentence
的子节点并逐步构建目标字符串来实现。例如:
// retrieve <sentence> as Node, not as text
Node sentence = (Node) xPath.evaluate("sentence", transUnitElement, XPathConstants.NODE);
StringBuilder resultBuilder = new StringBuilder();
NodeList children = sentence.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
Node child = children.item(i);
short nodeType = child.getNodeType();
switch (nodeType) {
case Node.TEXT_NODE:
String text = child.getTextContent().trim();
resultBuilder.append(text);
break;
case Node.ELEMENT_NODE:
String id = ((Element) child).getAttribute("id");
resultBuilder.append(" {").append(id).append("} ");
break;
default:
throw new IllegalStateException("Unexpected node type: " + nodeType);
}
}
// outputs "first part of the text {interpolation_1} second part of the text {interpolation_2}"
System.out.println(resultBuilder.toString());
答案 1 :(得分:1)
您是否考虑过使用XSLT转换进行此操作?在XSLT 3.0中,它只是
<xsl:template match="sentence">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="a">{<xsl:value-of select="."}</xsl:template>