Question

有下一个块

<div class="text">
  <h1>head1</h1>
    Text1 <br/><br/> text12  <br/><br/> text 13
  <h1>head11</h1>
    Text11
  <h3>head3</h3>
    Text2
</div>

如何在第一个H1之后获取文本而忽略<br/><br/>为

Text1 
text12
text 13

我使用Grab Python page = g.doc.select（＆＃39; // div [@class =＆＃34; text＆＃34;] / h3 [1] / following-sibling :: text（）]＆＃39;）结果是

Text1
text12
text 13
Text11
Text2

Answer 1

您可以尝试选择只有一个text()兄弟姐妹之前的h1 ...

//div[@class='text']/text()[count(preceding-sibling::h1)=1]

另一种选择是尝试使用Kayessian方法......

//div[@class='text']/h1[1]/following-sibling::text()[count(.|//div[@class='text']/h1[1+1]/preceding-sibling::text()) = count(//div[@class='text']/h1[1+1]/preceding-sibling::text())]

这里有一个更好的example and explanation of the Kayessian method。

Xpath在第一个html标记之后获取文本

1 个答案: