我在管道中发送项目时遇到的问题很少,因为我的请求正在通过多个功能。
我只是希望有任何手动方式将项目对象发送到scrapy管道。 因为我不知道scrapy的内部细节。
假设我有一个名为
的函数def parseDetails(self, response):
item = DmozItem()
item['test'] = "mytest"
sendToPiepline(piplineName , item)
答案 0 :(得分:0)
def parseDetails(self, response):
item = DmozItem()
item['test'] = "mytest"
# Call pipeline.
itemproc = self.crawler.engine.scraper.itemproc
itemproc.process_item(item, self)
return item
答案 1 :(得分:0)
如果直接委托给ItemPipelineManager
,则会在管理器中引发未处理的异常:
[2018-07-21 20:00:02] CRITICAL - Unhandled error in Deferred:
[2018-07-21 20:00:02] CRITICAL -
Traceback (most recent call last):
File "/home/vagrant/.local/share/virtualenvs/vagrant-gKDsaKU3/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/vagrant/monitor/pipelines/filter.py", line 24, in process_item
raise DropItem()
scrapy.exceptions.DropItem
这也可能可能无意地更改了管道的状态并影响了处理。
我认为更好的方法是抓取您要查找的Pipeline
实例,然后直接调用它:
try:
# Manually call the filter
f = utils.get_pipeline_instance(self, FilterPipeline)
f.process_item(p, self)
except DropItem:
pass
使用助手功能:
def get_pipeline_instance(spider, pipeline_class):
manager = spider.crawler.engine.scraper.itemproc
for pipe in manager.middlewares:
if isinstance(pipe, pipeline_class):
return pipe
else:
raise NotConfigured('Invalid pipeline')