Scrapy在运行代码时显示这样的统计数据
2016-11-18 06:41:38 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 656,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 2661,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 11, 18, 14, 41, 38, 759760),
'item_scraped_count': 2,
'log_count/DEBUG': 5,
'log_count/INFO': 7,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2016, 11, 18, 14, 41, 37, 807590)}
我的目标是访问response_count
中的request_count
或process_response
或Spider中的任何方法。
我想在蜘蛛抓取N个总网址时关闭蜘蛛。
答案 0 :(得分:1)
如果您想根据已完成的请求数量关闭蜘蛛,我建议您在CLOSESPIDER_PAGECOUNT
中使用[settings.py
] :( https://doc.scrapy.org/en/latest/topics/extensions.html#closespider-pagecount)
<强> settings.py 强>
CLOSESPIDER_PAGECOUNT= 20 # so end after 20 pages have been crawled
如果你想在蜘蛛内部访问Scrapy Stats,你可以这样做:
self.crawler.stats.get_value('my_stat_name') # change it to `response_count` or `request_count`