scrapy:访问管道中的spider类变量__init__

时间:2013-11-22 13:42:41

标签: python-2.7 scrapy

我知道您可以在process_item()中访问spider变量,但是如何在管道 init 函数中访问spider变量?

class SiteSpider(CrawlSpider):
   def __init__(self):
        self.id = 10

class MyPipeline(object):
     def __init__(self):
        ...

我还需要在MyPipeline中访问CUSTOM_SETTINGS_VARIABLE。

1 个答案:

答案 0 :(得分:7)

您无法访问spider实例,因为在引擎启动时已完成管道初始化。实际上,您必须认为您的管道处理多个蜘蛛,而不仅仅是一个蜘蛛。

话虽如此,您可以挂钩spider_opened信号以在启动时访问蜘蛛实例。

from scrapy import signals


class MyPipeline(object):

    def __init__(self, mysetting):
        # do stuff with the arguments...
        self.mysetting = mysetting

    @classmethod
    def from_crawler(cls, crawler):
        settings = crawler.settings
        instance = cls(settings['CUSTOM_SETTINGS_VARIABLE']
        crawler.signals.connect(instance.spider_opened, signal=signals.spider_opened)
        return instance

    def spider_opened(self, spider):
        # do stuff with the spider: initialize resources, etc.
        spider.log("[MyPipeline] Initializing resources for %s" % spider.name)

    def process_item(self, item, spider):
        return item