Question

我正在使用scrapy来抓取不同的网站，每个网站都有一个项目（提取不同的信息）

嗯，例如我有一个通用管道（大部分信息是相同的）但现在我正在抓取一些谷歌搜索响应，管道必须是不同的。

例如：

GenericItem使用GenericPipeline

但是GoogleItem使用了GoogleItemPipeline，但是当蜘蛛抓取时，它会尝试使用GenericPipeline代替GoogleItemPipeline ....我该如何指定谷歌的哪条管道？蜘蛛必须使用？

Answer 1

现在只有一种方法 - 检查管道中的项目类型并处理它或返回“按原样”

pipelines.py ：

from grabbers.items import FeedItem

class StoreFeedPost(object):

    def process_item(self, domain, item):
        if isinstance(item, FeedItem):
            #process it...

        return item

items.py ：

from scrapy.item import ScrapedItem

class FeedItem(ScrapedItem):
    pass

Python Scrapy，如何为项目定义管道？

1 个答案: