我写的Scrapy蜘蛛应该用AJAX处理一些网站。理论上它应该工作正常,而且当我在Scrapy shell中使用fetch()手动使用它时,它可以正常工作,但是当我运行" scrapy crawl ..."我没有在日志中看到任何POST请求,也没有任何项目被删除。它可能是什么,问题的根源是什么?
2016-10-09 01:11:16 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 9,
'downloader/exception_type_count/twisted.internet.error.DNSLookupError': 1,
'downloader/exception_type_count/twisted.internet.error.TimeoutError': 8,
'downloader/request_bytes': 106652,
'downloader/request_count': 263,
'downloader/request_method_count/GET': 263,
'downloader/response_bytes': 5644786,
'downloader/response_count': 254,
'downloader/response_status_count/200': 252,
'downloader/response_status_count/301': 1,
'downloader/response_status_count/302': 1,
'dupefilter/filtered': 19,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 10, 8, 22, 11, 16, 949472),
'log_count/DEBUG': 265,
'log_count/INFO': 11,
'request_depth_max': 3,
'response_received_count': 252,
'scheduler/dequeued': 263,
'scheduler/dequeued/memory': 263,
'scheduler/enqueued': 263,
'scheduler/enqueued/memory': 263,
'start_time': datetime.datetime(2016, 10, 8, 22, 7, 7, 811163)}
2016-10-09 01:11:16 [scrapy] INFO: Spider closed (finished)
日志是:
http://www.spoilertv.com/feeds/posts/default/-/Reviews?start-index=501
http://www.spoilertv.com/feeds/posts/default/-/Reviews?start-index=751
http://www.spoilertv.com/feeds/posts/default/-/Reviews?start-index=1001
....
....
http://www.spoilertv.com/feeds/posts/default/-/Reviews?start-index=10001
答案 0 :(得分:1)
parseProdPage
方法中未使用parseCat
方法返回的请求。你应该从屈服开始:yield self.parseProdPage(response)
此外,您可能希望在同一请求中设置dont_filter=True
,否则大部分将被过滤掉(因为它们都具有相同的网址)。