Question

我想要的是蜘蛛引擎识别到下一页的链接。

这是此页http://quotes.toscrape.com/

我有两个变种。第一个是基于css语法的，可以工作，但第二个（我希望xpath版本是，不会）

next_page_url = response.css('li.next > a::attr(href)').extract_first()

//以下内容不起作用

next_page_url = response.xpath('/a[contains(@href,"next")]/@href').extract_first()

因此，虽然我可以使用css，但我仍然很想知道给定的xpath语法有什么不对，这使得它没有给出其css等效的结果。

谢谢

它在这里：

#follow pagination link
next_page_url = response.css('li.next > a::attr(href)').extract_first()
if next_page_url:
   next_page_url = response.urljoin(next_page_url)
   yield scrapy.Request(url=next_page_url,callback=self.parse)

Answer 1

考虑提供的HTML目标链接在"next"中不包含@href。请尝试以下表达式：

next_page_url = response.xpath('/a[contains(text(), "Next")]/@href').extract_first()

如果你想要你的CSS选择器的精确模拟：

next_page_url = response.xpath('/li[contains(@class, "next")]/a/@href').extract_first()

xpath一行没有给我链接

1 个答案: