应用错误收集

我有一个从芹菜任务调用的scrapy蜘蛛。现在我想根据一些DB值自定义start_urls = []。我知道Scrapy接受来自shell的参数但在这种情况下不起作用。

我已经尝试将url作为参数传递给Spider构造函数但是没有工作：

# getting the arguments on the spider contructor
class MySpider(scrapy.Spider):
    start_urls = []

def __init__(self, some_url, *args, **kwargs):
    super(MySpider, self).__init__(*args, **kwargs)
    self.start_urls.append(some_url)

# running the spirder
process = CrawlerProcess()
process.crawl(MySpider(some_url='www.google.com'))

在终端上，我看到第一个电话确实获得了“www.google.com”网址，但接下来的网址没有。

从脚本实例化时将参数传递给Scrapy spider

0 个答案: