Question

我有一些HTML：

<hr noshade>
<p><a href="#1">Some text here</a></p>
<p style="margin-top:0pt;margin-bottom:0pt;line-height:120%;"><span style="color:#000000;font-weight:bold;">This is some description</span></p>
<hr noshade> <!-- so <hr noshade> is the delimiter for me -->
<p><a href="#2">Some more text here</a></p>
<p style="margin-top:0pt;margin-bottom:0pt;line-height:120%;"><span style="color:#000000;font-weight:bold;">This is description for some more text</span></p>
<hr noshade>

在使用nokogiri解析时，我想在由我自己的分隔符<hr noshade>分隔的这些标记集之间打印信息。因此，第一个块应该在两个hr noshade标签之间的所有“p”标签之间打印信息，依此类推。

Answer 1

我正在XPath select all elements between two specific elements

上使用接受的答案

我只有一个半安全的解决方案

您可以使用此XPath表达式：

.//hr[1][@noshade]
  /following-sibling::*[not(self::hr[@noshade])]
                       [count(preceding-sibling::hr[@noshade])=1]

表示<hr noshade> 1和2之间的第一个组

然后，

.//hr[2][@noshade]
  /following-sibling::*[not(self::hr[@noshade])]
                       [count(preceding-sibling::hr[@noshade])=2]

表示<hr noshade> 2和3等之间的元素

这些表达式选择了什么：

<hr noshade>的所有兄弟姐妹，由其位置N
只有N <hr noshade>个兄弟姐妹，即在第N组中定位
并且不是<hr noshade>他们自己

因为它会在2 <hr noshade>之间选择几个元素，所以你可能需要循环结果并为每个兄弟元素提取数据。

任何更通用的解决方案？

将HTML从定义的起点解析到定义的终点？

1 个答案: