Question

这是我的HTML：

<div class="main">
<p>Abcd</p>
<p>Abcd</p>
<h3>Head 3.1</h3>
<p>Abcd</p>
<h3>Head 3.2</h3>
</div>

我需要在<p>标记内以及使用XPath的第一个<div>标记之前选择<h3>个标记。怎么做？

Answer 1

您可以在xpath中使用[not(preceding-sibling::h3)]语句来仅获取上面没有h3个节点的节点：

> response.xpath("//div/p[not(preceding-sibling::h3)]").extract()
< [u'<p>Abcd</p>', u'<p>Abcd</p>']