我在这里需要帮助解决我的问题,或者至少需要一些建议。我使用HTMLcleaner解析HTML文档并使用XPATH。
我有这样的事情:
<html>
[code and other <h4> tags]
<h4>Random name</h4>
<a href="link" target="target"> Text I want to get </a>
<a href="link2" target="target2"> Text I want to get 2 </a>
<a href="link3" target="target3"> Text I want to get 3 </a>
<a href="link4" target="target4"> Text I want to get 4 </a>
<h4> Random name 2 </h4>
<a href="link" target="target"> Text I don't want to get </a>
[code and other <h4> tags]
</html>
确定。我有几个<h4>
标记,每个标记都带有<a>
标记和一些文本。我的问题是我不知道如何从特定的文本中获取所有相应的内容,就像“h4 [i]”一样。我试过这样的东西,但它不起作用:
String xpath = "h4["+number+"]//a" //where number will increment
感谢您的建议,帮助您!
答案 0 :(得分:1)
使用强>:
/*/h4[1]/following-sibling::a[not(preceding-sibling::h4[2])]/text()
基于XSLT的验证:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/h4[1]/following-sibling::a[not(preceding-sibling::h4[2])]/text()"/>
</xsl:template>
</xsl:stylesheet>
将此转换应用于以下XML文档(提供的片段,包含在单个顶部元素中以成为格式良好的XML文档):
<html>
<h4>Random name</h4>
<a href="link" target="target"> Text I want to get </a>
<a href="link2" target="target2"> Text I want to get 2 </a>
<a href="link3" target="target3"> Text I want to get 3 </a>
<a href="link4" target="target4"> Text I want to get 4 </a>
<h4> Random name 2 </h4>
<a href="link" target="target"> Text I don't want to get </a>
</html>
评估Xpath表达式,并将所有选定(文本)节点复制到输出中:
Text I want to get Text I want to get 2 Text I want to get 3 Text I want to get 4