Question

我有一个scrapy pipelines.py，我想获得给定的参数。在我的spider.py中，它完美无缺：

class MySpider( CrawlSpider ):
    def __init__(self, host='', domain_id='', *args, **kwargs):

        super(MySpider, self).__init__(*args, **kwargs)
        print user_id
        ...

现在，我需要＆＃34; user_id＆＃34;在我的pipelines.py中创建sqlite数据库，如＆＃34; domain-123.db＆＃34;。我在整个网络上搜索我的问题，但我找不到任何解决方案。

有人可以帮助我吗？

PS：是的，我尝试在我的管道类中使用super（）函数，比如spyer.py，它不起作用。

Answer 1

在spider的构造函数中设置参数：

class MySpider(CrawlSpider):
    def __init__(self, user_id='', *args, **kwargs):
        self.user_id = user_id

        super(MySpider, self).__init__(*args, **kwargs)

并在管道的open_spider()方法中阅读它们：

def open_spider(self, spider):
    print spider.user_id

Answer 2

我可能来不及为op提供有用的答案，但对于将来遇到此问题的任何人（正如我所做的那样），您应该检查类方法from_crawler和/或from_settings。

通过这种方式，您可以按照自己的方式传递参数。

检查： https://doc.scrapy.org/en/latest/topics/item-pipeline.html#from_crawler

from_crawler(cls, crawler)

如果存在，则调用此类方法以从Crawler创建管道实例。它必须返回管道的新实例。 Crawler对象提供对所有Scrapy核心组件的访问，如设置和信号;它是管道访问它们并将其功能挂钩到Scrapy的一种方式。

参数：crawler（Crawler`对象） - 使用此管道的抓取工具

在init上使用scrapy管道中的参数

2 个答案:

在__init__上使用scrapy管道中的参数

2 个答案:

在init上使用scrapy管道中的参数