Question

我的HTML与此类似：

<div>
    <h4><a href="#">Don't Match Me</a></h4>
    <a href="#">Match Me</a>
    <a href="#">Match Me</a>
    <a href="#">Match Me</a>
</div>

如何选择不在a内的所有h4元素？我试过a[not(ancestor::h4)]，但老实说我不知道我在做什么。

另外，作为一个小问题，是否有一个实现PyQuery或BeautifulSoup的Scrapy选择器类？

Answer 1

这是你应该使用的Scrapy选择器：

hxs.select('//a[not(ancestor::h4)]/text()').extract()