在scrapy上看不到倾倒的统计数据

时间:2014-07-10 15:12:56

标签: amazon-s3 statistics scrapy

当我运行scrapy教程中提供的示例时,我可以看到stdout中打印的日志:

2014-07-10 16:08:21+0100 [pubs] INFO: Spider opened
2014-07-10 16:08:21+0100 [pubs] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2014-07-10 16:08:21+0100 [pubs] INFO: Closing spider (finished)
2014-07-10 16:08:21+0100 [pubs] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 471,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 3897,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2014, 7, 10, 15, 8, 21, 970741),
'item_scraped_count': 1,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2014, 7, 10, 15, 8, 21, 584373)}
2014-07-10 16:08:21+0100 [pubs] INFO: Spider closed (finished)

但是,当我更改设置' FEED_URI'要将结果文件导出到S3,我在任何地方都看不到统计数据。我已经尝试过打印crawler.stats.spider_stats,但它仍然是空的。有什么想法吗?

1 个答案:

答案 0 :(得分:0)

即使使用了“LOG_ENABLED”,我也无法通过scrapy来转储统计信息。和' DUMP_STATS'设为true。但是,我通过在反应堆模拟结束时添加这行代码手动转储统计信息找到了一种解决方法:

log.msg("Dumping Scrapy stats:\n" + pprint.pformat(crawler.stats.get_stats()))