scrapy没有提出所有要求

时间:2016-05-16 08:44:05

标签: web-scraping scrapy

我正在使用scrapy下载页面列表,此时此刻我没有提取任何数据,所以我只在csv文件中保存response.body。

我也没有爬行,所以开始网址是我需要获得的唯一网址,我有400个网址列表

start_urls =['url_1','url_2,'url_3',---,'url_400']

但是我只获得了大约170的来源,没有任何关于什么的信息;其余部分正在发生。

这是我最后得到的日志

2016-05-16 04:30:25 [scrapy] INFO: Closing spider (finished)
2016-05-16 04:30:25 [scrapy] INFO: Stored csv feed (166 items) in: pages.csv
2016-05-16 04:30:25 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 11,
 'downloader/exception_type_count/twisted.internet.error.TimeoutError': 6,
 'downloader/exception_type_count/twisted.web._newclient.ResponseFailed': 5,
 'downloader/request_bytes': 95268,
 'downloader/request_count': 180,
 'downloader/request_method_count/GET': 180,
 'downloader/response_bytes': 3931169,
 'downloader/response_count': 169,
 'downloader/response_status_count/200': 166,
 'downloader/response_status_count/404': 3,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 5, 16, 9, 0, 25, 461208),
 'item_scraped_count': 166,
 'log_count/DEBUG': 350,
 'log_count/INFO': 17,
 'response_received_count': 169,
 'scheduler/dequeued': 180,
 'scheduler/dequeued/memory': 180,
 'scheduler/enqueued': 180,
 'scheduler/enqueued/memory': 180,
 'start_time': datetime.datetime(2016, 5, 16, 8, 50, 34, 443699)}
2016-05-16 04:30:25 [scrapy] INFO: Spider closed (finished)

0 个答案:

没有答案