使用文本解析XML自关闭标记

时间:2018-04-19 17:21:26

标签: java xml gate

嘿大家我试图解析我所拥有的XML文件的这一部分。我遇到的问题是文本包含很多自闭标签。我无法删除这些标签,因为它们为我提供了一些索引细节。 如何在没有所有“节点”标签的情况下访问文本?

<TextWithNodes>
 <Node id="0"/>A TEENAGER <Node
id="11"/>yesterday<Node id="20"/> accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2<Node id="146"/>.<Node
id="147"/>
</TextWithNodes>

3 个答案:

答案 0 :(得分:2)

虽然奇怪,但这个XML实际上是格式良好的,可以使用普通的XML工具进行解析。 TextWithNodes元素只是混合内容。

TextWithNodes的字符串值可以通过简单的XPath获得,

string(/TextWithNodes)

产生你想要的文字,没有其他标记(自我关闭或其他):

 A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.

答案 1 :(得分:1)

以下是一些示例代码,使用了在Java中使用XPATH的回答https://stackoverflow.com/a/49926918/2735286(@kjhughes的信用):

public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException, XPathExpressionException {

    String text = "<TextWithNodes>\n" +
            " <Node id=\"0\"/>A TEENAGER <Node\n" +
            "id=\"11\"/>yesterday<Node id=\"20\"/> accused his parents of cruelty\n" +
            "by feeding him a daily diet of chips which sent his weight\n" +
            "ballooning to 22st at the age of l2<Node id=\"146\"/>.<Node\n" +
            "id=\"147\"/>\n" +
            "</TextWithNodes>";
    DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = builderFactory.newDocumentBuilder();
    Document xmlDocument = builder.parse(new ByteArrayInputStream(text.getBytes("UTF-8")));
    XPath xPath = XPathFactory.newInstance().newXPath();
    String expression = "//TextWithNodes";
    System.out.println(xPath.compile(expression).evaluate(xmlDocument, XPathConstants.STRING));
}

打印出来:

A TEENAGER yesterday accused his parents of cruelty by feeding him a daily diet of chips which sent his weight ballooning to 22st at the age of l2.

答案 2 :(得分:0)

使用XML解析器库,如Jsoup。 https://jsoup.org/

在这个问题的答案中提供了如何提供: How to parse XML with jsoup