Question

在询问我的上一个问题（How to pass parameter to a scrapy pipeline object）之后，我试图更好地理解scrapy中管道和爬虫之间的关系

其中一个答案是：

@classmethod
def from_crawler(cls, crawler):
    # Here, you get whatever value was passed through the "table" parameter
    settings = crawler.settings
    table = settings.get('table')

    # Instantiate the pipeline with your table
    return cls(table)

def __init__(self, table):
    _engine = create_engine("sqlite:///data.db")
    _connection = _engine.connect()
    _metadata = MetaData()
    _stack_items = Table(table, _metadata,
                         Column("id", Integer, primary_key=True),
                         Column("detail_url", Text),
    _metadata.create_all(_engine)
    self.connection = _connection
    self.stack_items = _stack_items

我很困惑：

@classmethod
def from_crawler(cls, crawler):
    # Here, you get whatever value was passed through the "table" parameter
    settings = crawler.settings
    table = settings.get('table')

抓取工具类是否已经存在，或者我们是否在此处创建它。有人可以更详细地解释这里发生的事情吗？我一直在阅读包括http://scrapy.readthedocs.io/en/latest/topics/api.html#crawler-api和http://scrapy.readthedocs.io/en/latest/topics/architecture.html在内的多个来源，但我还没有将这些内容整合在一起。

Answer 1

那又是我了。）

也许你没有得到的是Python中classmethod的含义。在您的情况下，它是属于您的SQLlitePipeline类的方法。因此，cls本身就是SQLlitePipeline类。

Scrapy调用此管道方法传递crawler对象，Scrapy自行实例化。到目前为止，我们还没有SQLlitePipeline个实例。换句话说，管道流程还没有开始。

从抓取工具设置中获取所需参数（table）后，from_crawler最终会通过cls(table)返回管道实例 （还记得cls是什么，对吗？因此，它与执行SQLlitePipeline(table)相同。）

这是一个简单的Python对象实例化，因此__init__将使用它所期望的表名调用，然后管道流将开始。

修改

对Scrapy执行的流程进行逐步概述也许是件好事。当然，它比我要说明的要复杂得多，但希望它能让你更好地理解。

1）你调用Scrapy

2）Scrapy实例化crawler对象

crawler = Crawler(...)

3）Scrapy标识您要使用的管道类（SQLlitePipeline）并调用其from_crawler方法。

# Note that SQLlitePipeline is not instantiated here, as from_crawler is a class method # However, as we saw before, this method returns an instance of the pipeline class pipeline_instance = SQLlitePipeline.from_crawler(crawler)

4）从此开始，它调用列出的管道实例方法here

pipeline_instance.open_spider(...) pipeline_instance.process_item(...) pipeline_instance.close_spider(...)

scrapy中的class_thod from_crawler

1 个答案: