Question

这可能是Passing arguments to process.crawl in Scrapy python的一个子问题，但作者将答案（不能回答我问自己的问题）作为令人满意的答案。

这是我的问题：我无法使用scrapy crawl mySpider -a start_urls(myUrl) -o myData.json
相反，我想/需要使用crawlerProcess.crawl(spider)我已经找到了几种传递参数的方法（无论如何它在我链接的问题中得到了解答）但是我无法理解我应该如何告诉它转储将数据导入myData.json ... -o myData.json部分有人有建议吗？或者我只是不明白它应该如何工作..？

以下是代码：

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()

spider = challenges(start_urls=["http://www.myUrl.html"])
crawlerProcess.crawl(spider)
#For now i am just trying to get that bit of code to work but obviously it will become a loop later.

dispatcher.connect(handleSpiderIdle, signals.spider_idle)

log.start()
print "Starting crawler."
crawlerProcess.start()
print "Crawler stopped."

Answer 1

您需要在设置中指定它：

process = CrawlerProcess({
    'FEED_URI': 'file:///tmp/export.json',
})

process.crawl(MySpider)
process.start()

Scrapy process.crawl（）将数据导出到json

1 个答案: