我的蜘蛛里面有几只Scrapy蜘蛛'目录(假设有50个蜘蛛),现在我想按顺序运行它们(不是并发的)
我可以使用以下代码同时运行它们,但由于某些政策,我决定按顺序运行它们,
start=datetime.now()
setting = get_project_settings()
process = CrawlerProcess(setting)
for spider_name in process.spiders.list():
print ("Running spider %s" % (spider_name))
print(spider_name)
process.crawl(spider_name) #query dvh is custom argument used in your scrapy
process.start()
print("***********Execution time : {0}".format((datetime.now()-start)))
另外,我试过
for spider_name in process.spiders.list():
print ("Running spider %s" % (spider_name))
os.system("pwd ")
os.system("pwd && scrapy crawl " + spider_name) //pwd to make sure it's in correct path and I see it is
但它似乎无法由os.system
运行,另一种解决方案是使用.sh
,但我不确定这是个好主意。
我正在寻找顺序运行蜘蛛的解决方案吗?