Question

我想限制每页中找到的项目数。

我发现this documentation似乎很符合我的需求：

class scrapy.contracts.default.ReturnsContract

This contract (@returns) sets lower and upper bounds for the items and 
requests returned by the spider. The upper bound is optional:

@returns item(s)|request(s) [min [max]]

但是我不明白如何使用此类。在我的蜘蛛中，我尝试添加

ReturnsContract.__setattr__("max",10)

但是没有用。我想念什么吗？

Answer 1

Spider Contracts用于测试目的，而不是控制数据提取逻辑。

测试蜘蛛会变得特别烦人，什么也没有   阻止您编写单元测试，任务很快就会变得繁琐。   Scrapy提供了一种通过多种方式测试蜘蛛的集成方法   合同。

这使您可以通过硬编码一个来测试蜘蛛的每个回调   示例url并检查各种约束以了解回调的方式   处理响应。每个合约都以@和为前缀   包含在文档字符串中。

出于您的目的，您可以简单地在提取逻辑中设置一个上限，例如：

response.xpath('//my/xpath').extract()[:10]

如何在易碎的蜘蛛上设置上限ReturnContract

1 个答案: