Scrapy,问题在CrawlerRunner中输入了多个Spider变量

时间:2019-03-16 07:51:21

标签: python variables scrapy runner

我写了我的爬虫蜘蛛,它以两个类变量开头,然后想要从Runner运行它。 我确实尝试过:

yield runner.crawl(MySpider1, variable1, variable2)

yield runner.crawl(MySpider1, [variable1, variable2])

yield runner.crawl(MySpider1, (variable1, variable2))

yield runner.crawl(MySpider1(variable1, variable2))

但是得到了

  

缺少1个必需的位置论证

这是我的代码:

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging

class MySpider(scrapy.Spider):

    def _init__(self, variable1, variable2, *args, **kwargs):
        super().__init__(*arg, **kwargs)
        self.variable1 = variable1
        self.variable2 = variable2

    # below should be any normal spider's parser





class Run_Spider_From_SubClass(SpiderEmail):

    def __init__(self, *args, **kwargs):
        super().__init__(self, *args, **kwargs)

        configure_logging()
        self.runner = CrawlerRunner(get_project_settings())

    @defer.inlineCallbacks
    def crawl(self):
        for variable1, variable2 in mydict.item():
            yield self.runner.crawl(MySpider, variable1, varialbe2)  # input issue that result in missing 1 positional argument
        reactor.stop()

    def run_spider_in_loop(self):
        self.crawl()
        reactor.run()

runner = Run_Spider_From_SubClass()
runner.run_spider_in_loop()

在Runnner中输入Spider变量的正确方法是什么?谢谢

1 个答案:

答案 0 :(得分:0)

您无需覆盖__init__中的MySpider(并且请注意,您那里仍然缺少_)。

要传递参数,您需要使用 named / keyword 参数,而不是 positional 参数:

yield self.runner.crawl(MySpider, variable1=variable1, variable2=variable2)

这将自动使它作为类变量self.variable1在您的Spider中可用。