我写下了中间件:
def process_request(self, request, spider):
request = self.change_proxy(request) # set request proxy
def process_response(self, request, response, spider):
if response.status != 200:
self.delete_proxy() # remove unusable proxy
return request.copy() # send request to process_request to change proxy
return response
def process_exception(self, request, exception, spider):
self.delete_proxy() # remove unusable proxy
# if comment the code below, i will not get 302. but maybe i will lost crawl some webpage?
return request.copy() # send request to process_request to change proxy
首先,我在redis中有一些代理,我想做的是:
但是当我运行我的蜘蛛几分钟时,我总是得到302或process_exception
总是被叫,为什么?如果我重新启动我的蜘蛛,它只在前几分钟工作正常...(所以代理IP是好的,我的代码怎么了?)
我怎么能以正确的方式做到这一点?