Question

我有一个自定义管道，其中包含一些需要在构造函数中注入的参数，例如：

class MyPipeline(object):
    def __init__(self, some_argument):
        self.some_argument = some_argument
...

脚本（我们称之为run_crawler.py）从我开始抓取过程的位置开始：

process = CrawlerProcess(get_project_settings())

process.crawl(SomeCrawler)
process.crawl(AnotherCrawler)
...
process.start()

在settings.py中：

ITEM_PIPELINES = {
    'crawler.pipelines.SomePipeline': 100,
    'crawler.pipelines.MyPipeline': 300
}

我想这是一个愚蠢的问题，但我无法在docs中找到如何使用自定义参数实例化MyPipeline。有人能指出我正确的方向吗？

特别是，我不知道应该怎么做（或者我应该怎么做）修改run_crawler.py来实例化MyPipeline的自定义参数，我猜它应该是这样的：

process = CrawlerProcess(get_project_settings())

process.crawl(SomeCrawler)
process.crawl(AnotherCrawler)
...
some_argument = ... # instantiate my custom argument
# this is made up, it's what i've been unable to find how to do properly
my_pipeline = MyPipeline(some_argument)
process.pipelines.append(my_pipeline, ...)

process.start()

Answer 1

您可以使用scrapy from_crawler方法。 scrapy文档有一个很好的description和example：

class MongoPipeline(object):

    collection_name = 'scrapy_items'

    def __init__(self, mongo_uri, mongo_db):
        self.mongo_uri = mongo_uri
        self.mongo_db = mongo_db

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongo_uri=crawler.settings.get('MONGO_URI'),
            mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
        )

“如果存在，则调用此类方法以从Crawler创建管道实例。它必须返回管道的新实例。”

这样您就可以根据爬虫或蜘蛛设置创建新的管道实例。

在scrapy的管道中注入参数

1 个答案: