Question

我有一个Scrapy项目，它使用自定义中间件和自定义管道来检查和存储Postgres数据库中的条目。中间件看起来有点像这样：

class ExistingLinkCheckMiddleware(object):

    def __init__(self):

        ... open connection to database

    def process_request(self, request, spider):

        ... before each request check in the DB
        that the page hasn't been scraped before

管道看起来很相似：

class MachinelearningPipeline(object):

    def __init__(self):

        ... open connection to database

    def process_item(self, item, spider):

        ... save the item to the database

它工作正常，但是当蜘蛛完成时我无法找到干净地关闭这些数据库连接的方法，这让我感到烦恼。

有谁知道怎么做？

Answer 1

我认为最好的方法是使用scrapy的信号spider_closed，例如：

from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher

class ExistingLinkCheckMiddleware(object):

    def __init__(self):
        # open connection to database

        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, spider, reason):
        # close db connection

    def process_request(self, request, spider):
        # before each request check in the DB
        # that the page hasn't been scraped before

另见：

希望有所帮助。

在Scrapy中关闭来自管道和中间件的数据库连接

1 个答案: