Question

我使用xpath来读取xhtml文档，我想读取xhtml文件的<p>标记内的所有元素。为此，我正在做这样的事情。

XPath xpath = XPathFactory.newInstance().newXPath();                
XPathExpression expr = xpath.compile("//p[2]/*");                 
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
    System.out.println("Nodes>>>>>>>>"+nodes.item(i).getNodeValue());
}

XHMTL样本看起来像这样..

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head><title>test</title></head>
    <body>
        <p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc</span> </p> 
        <p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc1</span> </p>
        <p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc2</span> </p>
    </body>
</html>

但我无法获取<p>标记内的节点，无法进入for循环。

任何人都可以帮助我解决这个问题。

提前致谢

Answer 1

       XPathExpression expr = xpath.compile(".//*[local-name()='p'][@id='ur_id']");

你能检查一下吗？我想这会让你成为你的节点。很高兴访问http://saxon.sourceforge.net/saxon6.5/expressions.html并了解解析中XPath的基础知识。

Answer 2

您的代码正在尝试打印nodeValue的Element节点，这不太可能是您想要的。我希望你想要nodeValue个Text节点。

另一个问题可能是命名空间。看起来你的xpath试图在没有命名空间的情况下匹配p元素，而它应该尝试匹配p命名空间中的http://www.w3.org/1999/xhtml元素。

Answer 3

您可以使用XPathAPI（javadoc）将节点提取为通用Java列表。

String expr = "//p[2]/*";

Map<String, String> ns = new Map<String, String>;
ns.put("html", "http://www.w3.org/1999/xhtml");

List<String> nodeValues = XPathAPI.html.selectNodeListAsStrings(doc, expr, ns);
for (String nodeValue : nodesValues) {
    System.out.println("Nodes>>>>>>>> " + nodeValue);
}

或

List<String> nodeValues = XPathAPI.html.selectListOfNodes(doc, expr, ns);
for (Node node : nodes) {
    System.out.println("Nodes>>>>>>>> " + node.getTextContent());
}

免责声明：我是XPathAPI库的作者。

使用xpath读取xhtml标记的问题

3 个答案: