Question

我需要一些关于如何继续我的项目管道的建议。我需要将一个项目POST到一个API（运行良好），并使用响应对象获取创建的实体的ID（也使用它），然后使用它来填充另一个实体。理想情况下，项管道可以返回实体ID。基本上，我处于一种我需要在无SQL数据库中进行编码的一对多关系的情况。什么是最好的方法？

Answer 1

也许我不理解你的问题，但听起来你只需要用def close_spider(self, spider):方法调用你的提交代码。你试过了吗？

Answer 2

最好的方法是使用 Mongodb ，这是一个NO-sql数据库，运行最符合scrapy。可以找到mongodb的管道here，并解释过程in this tutorial。

现在在Pablo Hoffman的解决方案中解释了什么，将不同管道中的不同项更新为一个可以通过以下装饰器在Pipeline对象的process_item方法上实现，以便检查管道蜘蛛的属性是否应该被执行。（未测试代码但希望它会有所帮助）

def check_spider_pipeline(process_item_method):

    @functools.wraps(process_item_method)
    def wrapper(self, item, spider):

        # message template for debugging
        msg = '%%s %s pipeline step' % (self.__class__.__name__,)

        # if class is in the spider's pipeline, then use the
        # process_item method normally.
        if self.__class__ in spider.pipeline:
            spider.log(msg % 'executing', level=log.DEBUG)
            return process_item_method(self, item, spider)

        # otherwise, just return the untouched item (skip this step in
        # the pipeline)
        else:
            spider.log(msg % 'skipping', level=log.DEBUG)
            return item

    return wrapper

装饰者就是这样的：

class MySpider(BaseSpider):

    pipeline = set([
        pipelines.Save,
        pipelines.Validate,
    ])

    def parse(self, response):
        # insert scrapy goodness here
        return item

class Save(BasePipeline):

    @check_spider_pipeline
    def process_item(self, item, spider):
        # more scrapy goodness here
        return item

最后，您可以从this question获取帮助。

Scrapy管道架构 - 需要返回变量

2 个答案: