按顺序运行Scrapy Spiders

时间:2017-11-05 20:28:00

标签: python scrapy scrapy-spider

我的蜘蛛里面有几只Scrapy蜘蛛'目录(假设有50个蜘蛛),现在我想按顺序运行它们(不是并发的)

我可以使用以下代码同时运行它们,但由于某些政策,我决定按顺序运行它们,

start=datetime.now()
setting = get_project_settings()
process = CrawlerProcess(setting)
for spider_name in process.spiders.list():
    print ("Running spider %s" % (spider_name))
    print(spider_name)
    process.crawl(spider_name) #query dvh is custom argument used in your scrapy

process.start()
print("***********Execution time : {0}".format((datetime.now()-start)))

另外,我试过

for spider_name in process.spiders.list():
   print ("Running spider %s" % (spider_name))
   os.system("pwd ")
   os.system("pwd && scrapy crawl " + spider_name) //pwd to make sure it's in correct path and I see it is

但它似乎无法由os.system运行,另一种解决方案是使用.sh,但我不确定这是个好主意。 我正在寻找顺序运行蜘蛛的解决方案吗?

0 个答案:

没有答案