我创建了一个脚本,以在同一进程中运行两个蜘蛛,并生成所需的输出。如果第一个蜘蛛在第二个蜘蛛之前完成爬网,则将获得所需的输出。但是,如果第二个蜘蛛程序在第一个蜘蛛程序之前完成执行,则脚本将终止,而无需等待第一个蜘蛛程序完成爬网。可能是什么原因?我应该对我的代码进行哪些修改?
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
setting = get_project_settings()
process = CrawlerProcess(setting)
for spider_name in process.spider_loader.list():
setting['FEED_FORMAT'] = 'json'
setting['LOG_LEVEL'] = 'INFO'
setting['FEED_URI'] = spider_name+'.json'
setting['LOG_FILE'] = spider_name+'.log'
process = CrawlerProcess(setting)
print("Running spider %s" % spider_name)
process.crawl(spider_name)
process.start()
print("Completed")