我知道您可以在process_item()中访问spider变量,但是如何在管道 init 函数中访问spider变量?
class SiteSpider(CrawlSpider):
def __init__(self):
self.id = 10
class MyPipeline(object):
def __init__(self):
...
我还需要在MyPipeline中访问CUSTOM_SETTINGS_VARIABLE。
答案 0 :(得分:7)
您无法访问spider实例,因为在引擎启动时已完成管道初始化。实际上,您必须认为您的管道处理多个蜘蛛,而不仅仅是一个蜘蛛。
话虽如此,您可以挂钩spider_opened
信号以在启动时访问蜘蛛实例。
from scrapy import signals
class MyPipeline(object):
def __init__(self, mysetting):
# do stuff with the arguments...
self.mysetting = mysetting
@classmethod
def from_crawler(cls, crawler):
settings = crawler.settings
instance = cls(settings['CUSTOM_SETTINGS_VARIABLE']
crawler.signals.connect(instance.spider_opened, signal=signals.spider_opened)
return instance
def spider_opened(self, spider):
# do stuff with the spider: initialize resources, etc.
spider.log("[MyPipeline] Initializing resources for %s" % spider.name)
def process_item(self, item, spider):
return item