Question

我知道有一种方法可以使用以下示例执行此操作，该示例取自this question（也基于the documentation）：

class MongoPipeline(object):

collection_name = 'scrapy_items'

def __init__(self, mongo_uri, mongo_db):
    self.mongo_uri = mongo_uri
    self.mongo_db = mongo_db

@classmethod
def from_crawler(cls, crawler):
    return cls(
        mongo_uri=crawler.settings.get('MONGO_URI'),
        mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
    )

但据我所知，这会从设置文件中获取这些值。在我的情况下，我必须将它用作参数，因为它是用户提供的任意字符串，因此它不会出现在任何文件中。

这种情况与this question类似。尽管如此，我还需要在__init__方法中使用该参数，因为我的管道是从另一个需要此参数的任意类继承的，所以为第二个问题提供了解决方法（这只是为了在spider中传递参数），但这对我来说不起作用（因为我需要__init__）。

澄清一下，我的情况如下：

class Foo():
    # this class is not a pipelines, 
    # it is just an arbitrary class which manages the connections with the databases
    def __init__(self, foo: str):
        self.foo=foo

class MyPipieline(Foo):
    def __init__(self, foo: str):
        Foo.__init__(foo)

其中foo是用户提供的字符串，因此无法从文件中读取。

有没有办法实现这个目标？

修改

只是为了澄清：foo参数是由用户在调用脚本时提供的，所以我的main是这样的：

parser = argparse.ArgumentParser(description='My main script')
# Add arguments
parser.add_argument('-f', '--foo', type=str, required=True)

args = parser.parse_args()

foo = args.foo
# Here I have the foo value I want to use in pipeline's __init__

process = CrawlerProcess(get_project_settings())
process.crawl(MySpider)
process.start()

Answer 1

如果用户提供foo，它可能会作为属性传递给spider实例，对吗？

在这种情况下，您将不得不将Foo实例化推迟到稍后：

class Foo:
    # Class logic

class MyPipieline():

    def __init__(self):
        # Create a dictionary of spiders to foos
        self.foos = {}

    def open_spider(self, spider):
        self.foos[spider.name] = Foo.__init__(spider.foo)

    def close_spider(self, spider):
        self.foos[spider.name].close() # If needed

需要self.foos字典，因为您可能有不同的蜘蛛，其中foo个属性正在运行simultaning。

将参数传递给init以获取scrapy

1 个答案:

将参数传递给__init__以获取scrapy

1 个答案:

将参数传递给init以获取scrapy