通过xpath选择器从html标签的liste中选择特定的html标签

时间:2018-04-05 18:24:51

标签: html xpath

我想从这个HTML代码中获取一些具体信息:

<div class="main">    
    <div class="a"><div><a>linkname1</a></div></div> <!-- I DON'T want get the text of this 'a' tag --> 
    <div class="b">xxx</div>
    <div class="c">xxx</div>
    <div class="a"><div><a>linkname2</a></div></div> <!-- I want get the text of this 'a' tag --> 
    <div class="a"><div><a>linkname3</a></div></div> <!-- I want get the text of this 'a' tag --> 
    <div class="a"><div><a>linkname4</a></div></div> <!-- I want get the text of this 'a' tag -->  
    <div class="a"><div><a>linkname5</a></div></div> <!-- I want get the text of this 'a' tag --> 
    <div class="d"></div>
    <div class="c">xxx</div>
    <div class="a"><div><a>linkname6</a></div></div> <!-- I DON'T want get the text of this 'a' tag --> 
    <div class="a"><div><a>linkname7</a></div></div> <!-- I DON'T want get the text of this 'a' tag --> 
    <div class="a"><div><a>linkname8</a></div></div> <!-- I DON'T want get the text of this 'a' tag --> 
    <div class="d"></div>
    <div class="c">xxx</div>
    <div class="a"><div><a>linkname9</a></div></div> <!-- I DON'T want get the text of this 'a' tag --> 
    <div class="a"><div><a>linkname10</a></div></div> <!-- I DON'T want get the text of this 'a' tag --> 
</div>

我想在数组中获取'second''a'(class)标记块中链接文本的列表(在第一个div与类'c'之间,第二个div在类'c'之间) 。我怎么能通过xpath选择器做到这一点?可能吗 ?我找不到怎么做..

以我的例子为例,预期的结果是:

linkname2
linkname3
linkname4
linkname5

谢谢:)

3 个答案:

答案 0 :(得分:2)

您的问题是 Set 问题,如本答案中所述:[{3}}。

因此,应用于您的特定情况,您应该使用交集,如下所示:

(: intersection :)
$set1[count(. | $set2) = count($set2)]

set1 应该是div[@class='c']
之后的跟随集 set2 应该是div[@class='d']之前的前一组。

现在,按照上面的公式将两者放在一起

set1 = "div[@class='c'][1]/following-sibling::*" and
set2 = "div[@class='d'][1]/preceding-sibling::*"

XPath表达式可能如下所示:

div[@class='c'][1]/following-sibling::*[count(. | current()/div[@class='d'][1]/preceding-sibling::*) = count(current()/div[@class='d'][1]/preceding-sibling::*)]

输出

linkname2
linkname3
linkname4
linkname5

答案 1 :(得分:0)

你可以尝试这个表达式:

/div/div[position() > 3 and position() < 8]/div/a/text()

答案 2 :(得分:0)

我找到了一个可能的解决方案:)

//following::div[@class='a' and count(preceding::div[@class="c"]) = 1]/div/a/text()