XPath在两个HTML注释之间进行选择不起作用?

时间:2013-10-29 16:18:00

标签: html ruby xpath nokogiri scraper

我正在尝试在两个HTML评论之间选择一些内容,但在纠正它时遇到一些问题(如“XPath to select between two HTML comments?”中所示)。 当新评论出现在同一行时似乎存在问题。

我的HTML:

<html>
 ........
 <!-- begin content -->
 <div>some text</div>
 <div>
   <p>Some more elements</p>
 </div>
 <!-- end content --><!-- begin content -->
 <div>more text</div>
 <!-- end content -->
 .......
</html>

我用:

doc.xpath("//node()[preceding-sibling::comment()[. = ' begin content ']]
          [following-sibling::comment()[. = ' end content ']]")

结果:

<div>some text</div>
<div>
  <p>Some more elements</p>
</div>
<!-- end content --><!-- begin content -->
<div>more text</div>

我想要的是:

<div>some text</div>
<div>
  <p>Some more elements</p>
</div>

1 个答案:

答案 0 :(得分:1)

如果您对第一对评论感兴趣,可以先查看第一条评论:

//comment()[.=' begin content ']/following::*[not(preceding::comment()[.=' end content '])]

即:

//comment()[1][.=' begin content ']           <-- look for first suitable comment
    /following::*                             <-- take all following nodes
         [not(preceding::comment()[.=' end content '])] <-- satisfying condition there is no preceding "end comment"