Question

我在Scrapy 0.20.0 上有一个 BaseSpider 。但我正在尝试收集找到的网站URL的数量，并在蜘蛛完成（关闭）时将其打印为INFO。问题是我无法在会话结束时打印这个简单的整数变量，并且parse()或parse_item()函数中的任何print语句打印得太早，很久之前。

我也查看了this问题，但它看起来有点过时，并且不清楚如何正确使用它。即把它放在哪里（myspider.py，pipelines.py等）？

现在我的蜘蛛代码就像：

class MySpider(BaseSpider):
...
foundWebsites = 0
...
def parse(self, response):
    ...
    print "Found %d websites in this session.\n\n" % (self.foundWebsites)

def parse_item(self, response):
    ...
    if item['website']:
        self.foundWebsites += 1
    ...

这显然不符合预期。有更好更简单的想法吗？

Answer 1

第一个答案referred to有效，无需向pipelines.py添加任何其他内容。只需在您的蜘蛛代码中添加“答案”，如下所示：

# To use "spider_closed" we also need:
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals

class MySpider(BaseSpider):
...
foundWebsites = 0
...
def parse(self, response):
    ...

def parse_item(self, response):
    ...
    if item['website']:
        self.foundWebsites += 1
    ...

def __init__(self):
    dispatcher.connect(self.spider_closed, signals.spider_closed)

def spider_closed(self, spider):
    if spider is not self:
        return
    print "Found %d websites in this session.\n\n" % (self.foundWebsites)

在spider_closed信号（完成）上打印scrapy统计数据的位置和方法？

1 个答案: