Question

我正在使用scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware来缓存scrapy请求。如果状态为200，我只想缓存它。这是默认行为吗？或者我是否需要将HTTPCACHE_IGNORE_HTTP_CODES指定为除 200以外的所有内容？

Answer 1

是的，默认情况下HttpCacheMiddleware为请求运行DummyPolicy。它几乎没有任何特殊之处，因此您需要将HTTPCACHE_IGNORE_HTTP_CODES设置为除200之外的所有内容。

Here's the source for the DummyPolicy 这些线路实际上很重要：

class DummyPolicy(object):

    def __init__(self, settings):
        self.ignore_http_codes = [int(x) for x in settings.getlist('HTTPCACHE_IGNORE_HTTP_CODES')]

    def should_cache_response(self, response, request):
        return response.status not in self.ignore_http_codes

所以实际上你也可以扩展它并覆盖should_cache_response()以明确检查200的内容，即return response.status == 200，然后通过{{3}将其设置为缓存策略}。

Answer 2

答案是否定的，您不需要这样做。您应该编写一个CachePolicy并更新settings.py以启用您的策略我将CachePolicy类放入middlewares.py

from scrapy.extensions.httpcache import DummyPolicy

class CachePolicy(DummyPolicy):
   def should_cache_response(self, response, request):
       return response.status == 200

然后更新settings.py，添加以下行

HTTPCACHE_POLICY = 'yourproject.middlewares.CachePolicy'

只用scrapy缓存200

2 个答案: