如何根据父类中的cli kwargs覆盖FEED_URI?

时间:2015-10-02 18:47:08

标签: python web-scraping scrapy

我想在我的父级爬虫类中设置以下内容,因为这对每个孩子都应该是相同的,我该怎么做?

scrapy crawl spiderX -a full  >> FEED_URI = /xx/spiderX_full
scrapy crawl spiderX -a quick >> FEED_URI = /xx/spiderX_quick

这是我到目前为止所做的:

@classmethod
def update_settings(cls, settings):
    settings_dict = cls.custom_settings or {}
    feed_uri = path.join(settings.get('FEED_DIR'), '%s' % cls.name)
    settings_dict['FEED_URI'] = feed_uri
    settings.setdict(settings_dict, priority='spider')

如何从此功能访问快速/完整args? 我试着这样做:

def __new__(cls, full=False, quick=False, *a, **kw):
    cls.full = full
    cls.quick = quick
    return super(MyCrawlSpider, cls).__new__(cls, *a, **kw)

但显然update_settings在它之前运行

1 个答案:

答案 0 :(得分:1)

尝试使用-s参数。

scrapy crawl spiderX -s FEED_URI=s3://mybucket/path/to/export.csv