我写了我的爬虫蜘蛛,它以两个类变量开头,然后想要从Runner运行它。 我确实尝试过:
yield runner.crawl(MySpider1, variable1, variable2)
或
yield runner.crawl(MySpider1, [variable1, variable2])
或
yield runner.crawl(MySpider1, (variable1, variable2))
或
yield runner.crawl(MySpider1(variable1, variable2))
但是得到了
缺少1个必需的位置论证
这是我的代码:
from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
class MySpider(scrapy.Spider):
def _init__(self, variable1, variable2, *args, **kwargs):
super().__init__(*arg, **kwargs)
self.variable1 = variable1
self.variable2 = variable2
# below should be any normal spider's parser
class Run_Spider_From_SubClass(SpiderEmail):
def __init__(self, *args, **kwargs):
super().__init__(self, *args, **kwargs)
configure_logging()
self.runner = CrawlerRunner(get_project_settings())
@defer.inlineCallbacks
def crawl(self):
for variable1, variable2 in mydict.item():
yield self.runner.crawl(MySpider, variable1, varialbe2) # input issue that result in missing 1 positional argument
reactor.stop()
def run_spider_in_loop(self):
self.crawl()
reactor.run()
runner = Run_Spider_From_SubClass()
runner.run_spider_in_loop()
在Runnner中输入Spider变量的正确方法是什么?谢谢
答案 0 :(得分:0)
您无需覆盖__init__
中的MySpider
(并且请注意,您那里仍然缺少_
)。
要传递参数,您需要使用 named / keyword 参数,而不是 positional 参数:
yield self.runner.crawl(MySpider, variable1=variable1, variable2=variable2)
这将自动使它作为类变量self.variable1
在您的Spider中可用。