Scrapy中的持久请求元数据

时间:2013-03-11 09:59:29

标签: python scrapy

我有什么方法可以在蜘蛛中拥有持久性请求元数据吗? request.meta只会持续到下一个回调,我必须做这样的事情:

def method1(self, response):
    request = Request(url, callback=self.method2)
    request.meta['persist'] = ...

    yield request

def method2(self, response):
    ...

    request = Request(url, callback=self.method3)
    request.meta['persist'] = response.meta['persist']

    yield request

我还做了一个装饰师来做这个,但我真的希望有一个更清洁的解决方案:

def persist_meta(callback):
    def inner(self, *args, **kwargs):
        for result in callback(self, *args, **kwargs):
            if isinstance(result, Request):
                response = args[0]

                persist = response.meta.get('persist', {})
                persist.update(result.meta.get('persist', {})

                result.meta['persist'] = persist

            yield result

    return inner

感谢任何帮助。

1 个答案:

答案 0 :(得分:1)

创建一个新的middleware并将您的代码保存在process_spider_input中。