如何将系统命令行参数传递给Scrapy CrawlerProcess?

时间:2017-09-11 21:59:27

标签: python-2.7 scrapy

我有一个Scrapy蜘蛛,我将系统参数传递给使用scrapy crawl命令。我试图使用CrawlerProcess而不是命令行来运行这个蜘蛛。如何将所有相同的命令行参数传递给此爬网程序进程? scrapy crawl example -o data.jl -t jsonlines -s JOBDIR=/crawlstate

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl(#How do I Pass arguments like -o data.jl -t jsonlines -s 
JOBDIR=/crawlstate here?)
process.start()

1 个答案:

答案 0 :(得分:2)

您可以在将项目设置传递给CrawlerProcess构造函数之前修改它们:

...
settings = get_project_settings()
settings.set('FEED_URI', 'data.jl', priority='cmdline')
settings.set('FEED_FORMAT', 'jsonlines', priority='cmdline')
settings.set('JOBDIR', '/crawlstate', priority='cmdline')
process = CrawlerProcess(settings)
...