我有一个Scrapy蜘蛛,我将系统参数传递给使用scrapy crawl命令。我试图使用CrawlerProcess而不是命令行来运行这个蜘蛛。如何将所有相同的命令行参数传递给此爬网程序进程?
scrapy crawl example -o data.jl -t jsonlines -s JOBDIR=/crawlstate
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl(#How do I Pass arguments like -o data.jl -t jsonlines -s
JOBDIR=/crawlstate here?)
process.start()
答案 0 :(得分:2)
您可以在将项目设置传递给CrawlerProcess
构造函数之前修改它们:
...
settings = get_project_settings()
settings.set('FEED_URI', 'data.jl', priority='cmdline')
settings.set('FEED_FORMAT', 'jsonlines', priority='cmdline')
settings.set('JOBDIR', '/crawlstate', priority='cmdline')
process = CrawlerProcess(settings)
...