在scrapy中是否有任何手动方式将项目发送到管道

时间:2012-12-17 02:23:29

标签: python scrapy

我在管道中发送项目时遇到的问题很少,因为我的请求正在通过多个功能。

我只是希望有任何手动方式将项目对象发送到scrapy管道。 因为我不知道scrapy的内部细节。

假设我有一个名为

的函数
def parseDetails(self, response):

  item = DmozItem()
  item['test'] = "mytest"

  sendToPiepline(piplineName , item)

2 个答案:

答案 0 :(得分:0)

scrapy/commands/parse.py

def parseDetails(self, response):
  item = DmozItem()
  item['test'] = "mytest"

  # Call pipeline.
  itemproc = self.crawler.engine.scraper.itemproc
  itemproc.process_item(item, self)

  return item

答案 1 :(得分:0)

如果直接委托给ItemPipelineManager,则会在管理器中引发未处理的异常:

[2018-07-21 20:00:02] CRITICAL - Unhandled error in Deferred:

[2018-07-21 20:00:02] CRITICAL -
Traceback (most recent call last):
  File "/home/vagrant/.local/share/virtualenvs/vagrant-gKDsaKU3/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/vagrant/monitor/pipelines/filter.py", line 24, in process_item
    raise DropItem()
scrapy.exceptions.DropItem

这也可能可能无意地更改了管道的状态并影响了处理。

我认为更好的方法是抓取您要查找的Pipeline实例,然后直接调用它:

try:
    # Manually call the filter
    f = utils.get_pipeline_instance(self, FilterPipeline)
    f.process_item(p, self)
except DropItem:
    pass

使用助手功能:

def get_pipeline_instance(spider, pipeline_class):
    manager = spider.crawler.engine.scraper.itemproc
    for pipe in manager.middlewares:
        if isinstance(pipe, pipeline_class):
            return pipe
    else:
        raise NotConfigured('Invalid pipeline')