我正在使用Scrapy默认的RetryMiddleware来尝试重新下载失败的网址。我想处理这样的方式页面,在响应时得到429个状态代码(" Too Many Requests")。
但我收到了错误
Traceback (most recent call last):
File "/home/vagrant/parse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 46, in process_response
response = method(request=request, response=response, spider=spider)
File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/downloadermiddlewares/retry.py", line 58, in process_response
reason = response_status_message(response.status)
File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/utils/response.py", line 58, in response_status_message
reason = http.RESPONSES.get(int(status)).decode('utf8', errors='replace')
AttributeError: 'NoneType' object has no attribute 'decode'
我试图调试问题并发现Scrapy RetryMiddleware在实际重试下载页面之前尝试定义先前失败的原因。
因此response_status_message
方法尝试使用状态代码和状态文本创建字符串,例如
>>> response_status_message(404)
'404 Not Found'
要获取响应字符串,它使用扭曲的响应方法http.RESPONSES.get(int(status))
。但是在没有使用get()
的默认参数的自定义http状态代码的情况下,它返回NoneType而不是string。
因此,Scrapy尝试将decode('utf8', errors='replace')
调用为NoneType。
是否有可能避免这种情况?