我无法在parse方法中更改spider设置。但绝对必须是一种方式。
例如:
class SomeSpider(BaseSpider): name = 'mySpider' allowed_domains = ['example.com'] start_urls = ['http://example.com'] settings.overrides['ITEM_PIPELINES'] = ['myproject.pipelines.FirstPipeline'] print settings['ITEM_PIPELINES'][0] #printed 'myproject.pipelines.FirstPipeline' def parse(self, response): #...some code settings.overrides['ITEM_PIPELINES'] = ['myproject.pipelines.SecondPipeline'] print settings['ITEM_PIPELINES'][0] # printed 'myproject.pipelines.SecondPipeline' item = Myitem() item['mame'] = 'Name for SecondPipeline'
但是!项目将由FirstPipeline处理。新的ITEM_PIPELINES参数不起作用。 如何在开始抓取后更改设置?提前谢谢!
答案 0 :(得分:3)
如果您希望不同的蜘蛛拥有不同的管道,您可以为蜘蛛设置管道列表属性,该属性定义该蜘蛛的管道。比管道检查存在:
class MyPipeline(object):
def process_item(self, item, spider):
if self.__class__.__name__ not in getattr(spider, 'pipelines',[]):
return item
...
return item
class MySpider(CrawlSpider):
pipelines = set([
'MyPipeline',
'MyPipeline3',
])
如果您希望不同的管道可以处理不同的项目,您可以这样做:
class MyPipeline2(object):
def process_item(self, item, spider):
if isinstance(item, MyItem):
...
return item
return item
答案 1 :(得分:0)
基于此信息丰富的issue#4196与telnet console的结合,甚至可以在执行后执行。
在启动1234
命令时将telnet客户端连接到 port (例如scrapy crawl
)和 password 并记录以下内容修改当前运行的downloader
的Python语句:
$ telnet 127.0.0.1 6023 # Read the actual port from logs.
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Username: scrapy
Password: <copy-from-logs>
>>> engine.downloader.total_concurrency
8
>>> engine.downloader.total_concurrency = 32
>>> est()
Execution engine status
time()-engine.start_time : 14226.62803554535
engine.has_capacity() : False
len(engine.downloader.active) : 28
engine.scraper.is_idle() : False
engine.spider.name : <foo>
engine.spider_is_idle(engine.spider) : False
engine.slot.closing : False
len(engine.slot.inprogress) : 32
len(engine.slot.scheduler.dqs or []) : 531
len(engine.slot.scheduler.mqs) : 0
len(engine.scraper.slot.queue) : 0
len(engine.scraper.slot.active) : 0
engine.scraper.slot.active_size : 0
engine.scraper.slot.itemproc_size : 0
engine.scraper.slot.needs_backout() : False