Question

以下是我从第一页到最后一页的分页代码：

    url = response.css("li.next a::attr(href)").extract_first()
    if url:
        url = response.urljoin(url)
        yield response.follow(url, self.parse)

Scrapy 1.4 release notes还有另一种方式：

for a in response.css('li.page a'):
    yield response.follow(a, self.parse)

我试过了：

    url = response.css("li.next a")[0]
    if url:
        yield response.follow(url, self.parse)

但我收到错误＆＃34; IndexError：列表索引超出范围＆＃34;在我可以使用try处理的最后一页中，除了：

    try:
        url = response.css("li.next a")[0]
    except IndexError:
        pass
    else:
        yield response.follow(url, self.parse)

我问是否有更好更短的方法来解决这个问题，还是应该坚持使用旧的response.url（）方式进行分页？我之所以这样问，是因为我在他们的发行说明中看到了这一点，粗体，＆＃34;现在推荐的方法是在Scrapy蜘蛛中创建请求＆＃34;。

Answer 1

如何使用发行说明方式？

for a in response.css('li.next a'):
    yield response.follow(a, self.parse)

    # if more than one next can be found and you just need the first
    # break

如果没有给定选择器的元素，则不会激活循环
如果有下一个元素，则可以通过yield执行。
最后一个中断用于处理页面包含多个下一个元素。

如何将response.css（）与response.follow（）一起用于Scrapy最后一页的分页？

1 个答案: