我们有这样的数据:
<h3>title1</h3>
<p> paragraph 1<p>
<p> paragraph 2<p>
<p> paragraph 3<p>
<h3>title2</h3>
<p> paragraph 4<p>
<p> paragraph 5<p>
<table>
<tr>
<td>data1</td>
<td>data2</td>
</tr>
</table>
<h3>title3</h3>
<p> paragraph 6<p>
<p> paragraph 7<p>
<p> paragraph 8<p>
<p> paragraph 9<p>
<h3>title4</h3>
<p> paragraph 10<p>
<p> paragraph 11<p>
<p> paragraph 12<p>
如何获取h3
之间的数据,即
[第1段,第2段,第3段]
[第4段,第5段,data1,data2]
[第6段,第7段,第8段,第9段]
[第10段,第11段,第12段]
我使用了以下XPath:
hdoc.xpath('h3[contains(.,"title1")]//following-sibling::*[following::*[self::h3]]//text()')
hdoc.xpath('h3[contains(.,"title2")]//following-sibling::*[following::*[self::h3]]//text()')
答案 0 :(得分:1)
尝试类似的东西:
hdoc.xpath("//p[./preceding-sibling::h3[contains(text(),'title1')] and ./following-sibling::h3[contains(text(),'title2')]]/text()")
hdoc.xpath("//p[./preceding-sibling::h3[contains(text(),'title2')] and ./following-sibling::h3[contains(text(),'title3')]]/text()")
hdoc.xpath("//p[./preceding-sibling::h3[contains(text(),'title3')] and ./following-sibling::h3[contains(text(),'title4')]]/text()")
hdoc.xpath("//p[./preceding-sibling::h3[contains(text(),'title4')] and not(./following-sibling::h3)]/text()")
如果您不想依赖每个h3的文本,则可以获取每个元素之前的h3数量:
# For elements between title1 and title2
hdoc.xpath('//p[count(preceding-sibling::h3)=1]/text() | //table[count(preceding-sibling::h3)=2]//td/text()')
# For elements between title2 and title3
hdoc.xpath('//p[count(preceding-sibling::h3)=2]/text() | //table[count(preceding-sibling::h3)=2]//td/text()')
...
答案 1 :(得分:0)
此XPath,
//text()[ preceding::h3[. = 'title1']
and following::h3[. = 'title2']]
将选择具有给定字符串值的h3
元素之间的所有文本节点。