我遇到了多次在此发布的问题。也就是说,我试图在一个脚本中运行两个搜寻器进程,并遇到ReactorNotRestartable
错误。
我看到的答案提供了同时或顺序运行Spider的方法,但中间没有传递任何参数。 我需要运行一个蜘蛛,对输出进行一些处理,然后运行第二个蜘蛛,并根据第一个蜘蛛的输出设置参数。这是我的伪代码:
def do_shallow_scrape():
results = []
def crawler_results(signal, sender, item, response, spider):
results.append(item)
dispatcher.connect(crawler_results, signal=signals.item_passed)
process = CrawlerProcess(get_project_settings())
process.crawl('list')
process.start()
return results
def do_details_scrape(ids):
results = []
def crawler_results(signal, sender, item, response, spider):
results.append(item)
dispatcher.connect(crawler_results, signal=signals.item_passed)
process = CrawlerProcess(get_project_settings())
process.crawl('details', ids=ids)
process.start()
return results
def coordinate_scrape():
shallow_results = do_shallow_scrape()
ids = some_script(shallow_results)
details = do_details_scrape(ids)
在此先感谢您的帮助!