环境:
Scrapy 0.16.2 双绞线12.2.0 python 2.7 MacOSX的-10.6
这是我的问题:
我尝试运行
scrapy shell http://aaa.17domn.com/bt9/file.php/MERH77V.html
错误:
[ScrapyHTTPPageGetter,client] Unhandled Error
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/selectreactor.py", line 150, in _doReadOrWrite
why = getattr(selectable, method)()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/tcp.py", line 202, in doRead
return self._dataReceived(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/tcp.py", line 208, in _dataReceived
rval = self.protocol.dataReceived(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/protocols/basic.py", line 564, in dataReceived
why = self.lineReceived(line)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.2-py2.7.egg/scrapy/core/downloader/webclient.py", line 50, in lineReceived
return HTTPClient.lineReceived(self, line.rstrip())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/web/http.py", line 450, in lineReceived
self.extractHeader(self._header)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/web/http.py", line 406, in extractHeader
key, val = header.split(':',1)
exceptions.ValueError: need more than 1 value to unpack
我从https://groups.google.com/forum/#!msg/scrapy-users/xFKo8ggzPxs/VXDl3CZ4V4cJ找到了解决方案 他们形容这是由扭曲引起的。然后我在http://twistedmatrix.com/trac/ticket/2842的/twisted/web/http.py中修补了函数extractHeader。它的作品
但是,坚持到现在!
我运行另一个网站
scrapy shell http://www1.wkdown.info/fs3/file.php/M994ATR.html
错误:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/defer.py", line 551, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.2-py2.7.egg/scrapy/core/downloader/webclient.py", line 122, in _build_response
status = int(self.status)
ValueError: invalid literal for int() with base 10: 'html'
我觉得响应标题会发生一些事情。 Scrapy无法很好地处理它。 任何的想法? 谢谢!