我正在尝试从https://www.goodreads.com/quotes中提取引号。看来我只得到第一页,而下一页则不起作用。 这是我的代码:
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'quotes'
start_urls = [
'http://www.goodreads.com/quotes'
]
def parse(self,response):
for quote in response.xpath("//div[@class='quote']"):
yield {
'quoteText': quote.xpath(".//div[@class ='quoteText']").extract_first()
}
next_page=response.css("a").xpath("@href").extract()
if next_page is not None:
next_page_link=response.urljoin(next_page)
yield scrapy.Request(url=next_page_link, callback= self.parse)
答案 0 :(得分:0)
您必须获得下一页链接的href
。
使用它来获取下一页URL:
next_page=response.css("a.next_page::attr(href)").get()
您可以在此处了解有关选择器的更多信息: https://docs.scrapy.org/en/latest/topics/selectors.html