我使用scrapy-redis,并更改代码:
def next_request(self):
block_pop_timeout = self.idle_before_close
request = self.queue.pop(block_pop_timeout)
while(not request):
time.sleep(30)
request = self.queue.pop(block_pop_timeout)
if request and self.stats:
self.stats.inc_value('scheduler/dequeued/redis', spider=self.spider)
return request
但是有一些问题,即使仍然有正在处理的请求,time.sleep也会暂停蜘蛛程序。 有更好的方法吗?
我已经解决了,将代码添加到我的蜘蛛中:
def spider_idle(self, spider):
#time.sleep(30)
raise DontCloseSpider()
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(MySpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_idle, signals.spider_idle)
return spider