Scrapy将不再开始。 yum今天更新了,它已经不再有效了。尝试删除scrapy并使用pip重新安装但没有成功。
Scrapy 0.22.2
$ uname -a
Linux localhost.localdomain 3.13.9-100.fc19.x86_64 #1 SMP Fri Apr 4 00:51:59 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ python -V
Python 2.7.5
$ scrapy
Traceback (most recent call last):
File "/usr/bin/scrapy", line 4, in <module>
execute()
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 122, in execute
cmds = _get_commands_dict(settings, inproject)
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 46, in _get_commands_dict
cmds = _get_commands_from_module('scrapy.commands', inproject)
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 29, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 20, in _iter_command_classes
for module in walk_modules(module_name):
File "/usr/lib/python2.7/site-packages/scrapy/utils/misc.py", line 68, in walk_modules
submod = import_module(fullpath)
File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/usr/lib/python2.7/site-packages/scrapy/commands/deploy.py", line 14, in <module>
from w3lib.form import encode_multipart
File "/usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/form.py", line 3, in <module>
if six.PY2:
AttributeError: 'module' object has no attribute 'PY2'
six.PY2?
$ python
Python 2.7.5 (default, Nov 12 2013, 16:18:42)
[GCC 4.8.2 20131017 (Red Hat 4.8.2-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import six
>>> six.PY2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'PY2'
>>> six.PY3
False
>>>
从/usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/form.py中删除对“six.PY2”的引用然后就可以启动。
#import six
#if six.PY2:
# from cStringIO import StringIO as BytesIO
#else:
# from io import BytesIO
到
import six
from cStringIO import StringIO as BytesIO
但是,然后尝试运行scrapy crawl MySpider
失败:
$ scrapy crawl MySpider
Starting domain scrape
2014-04-10 19:43:39-0400 [scrapy] INFO: Scrapy 0.22.2 started (bot: scrapy_myspider)
2014-04-10 19:43:39-0400 [scrapy] INFO: Optional features available: ssl, http11, boto
2014-04-10 19:43:39-0400 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'scrapy_myspider.spiders', 'CLOSESPIDER_TIMEOUT': 40, 'SPIDER_MODULES': ['scrapy_myspider.spiders'], 'LOG_LEVEL': 'INFO', 'RETRY_ENABLED': False, 'HTTPCACHE_DIR': '/tmp/scrapy_cache', 'HTTPCACHE_ENABLED': True, 'RETRY_TIMES': 1, 'BOT_NAME': 'scrapy_myspider', 'AJAXCRAWL_ENABLED': True, 'CONCURRENT_ITEMS': 400, 'COOKIES_ENABLED': False, 'DOWNLOAD_TIMEOUT': 14}
2014-04-10 19:43:40-0400 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, DefaultHeadersMiddleware, AjaxCrawlMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, ChunkedTransferMiddleware, DownloaderStats, HttpCacheMiddleware
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled item pipelines:
2014-04-10 19:43:41-0400 [MySpider] INFO: Spider opened
2014-04-10 19:43:41-0400 [MySpider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-04-10 19:43:41-0400 [MySpider] ERROR: Obtaining request from start requests
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1192, in run
self.mainLoop()
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1201, in mainLoop
self.runUntilCurrent()
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.7/site-packages/scrapy/utils/reactor.py", line 41, in __call__
return self._func(*self._a, **self._kw)
--- <exception caught here> ---
File "/usr/lib/python2.7/site-packages/scrapy/core/engine.py", line 111, in _next_request
request = next(slot.start_requests)
File "/usr/lib/python2.7/site-packages/scrapy/spider.py", line 50, in start_requests
yield self.make_requests_from_url(url)
File "/usr/lib/python2.7/site-packages/scrapy/spider.py", line 53, in make_requests_from_url
return Request(url, dont_filter=True)
File "/usr/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 26, in __init__
self._set_url(url)
File "/usr/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 52, in _set_url
self._url = escape_ajax(safe_url_string(url))
File "/usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/url.py", line 52, in safe_url_string
return moves.urllib.parse.quote(s, _safe_chars)
exceptions.AttributeError: '_MovedItems' object has no attribute 'urllib'
2014-04-10 19:43:41-0400 [MySpider] INFO: Closing spider (finished)
2014-04-10 19:43:41-0400 [MySpider] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
'finish_time': datetime.datetime(2014, 4, 10, 23, 43, 41, 120645),
'log_count/ERROR': 1,
'log_count/INFO': 7,
'start_time': datetime.datetime(2014, 4, 10, 23, 43, 41, 114721)}
2014-04-10 19:43:41-0400 [MySpider] INFO: Spider closed (finished)
任何想法从哪里开始?问题yum update
,python? :d
其他信息
$ pip freeze
...
six==1.6.1
...
$ python
>>> import six
>>> six.__file__
/usr/lib/python2.7/site-packages/six.pyc
$ yumdb info python-six
Loaded plugins: langpacks, refresh-packagekit
python-six-1.3.0-1.fc19.noarch
checksum_data = [**redacted**]
checksum_type = sha256
command_line = install python-bugzilla python-requests python-urllib3 python-six
from_repo = fedora
from_repo_revision = 1372417620
from_repo_timestamp = 1372419845
installed_by = 1000
origin_url = http://fedora.mirror.constant.com/linux/releases/19/Everything/x86_64/os/Packages/p/python-six-1.3.0-1.fc19.noarch.rpm
reason = user
releasever = 19
var_uuid = b2714b4a-0654-4c5c-8405-80724410fdde
$ yum info python-six
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
Name : python-six
Arch : noarch
Version : 1.3.0
Release : 1.fc19
Size : 50 k
Repo : installed
From repo : fedora
Summary : Python 2 and 3 compatibility utilities
URL : http://pypi.python.org/pypi/six/
License : MIT
Description : python-six provides simple utilities for wrapping over differences between
: Python 2 and Python 3.
:
: This is the Python 2 build of the module.
更多信息
$ repoquery -lq python-six
/usr/lib/python2.7/site-packages/six-1.3.0-py2.7.egg-info
/usr/lib/python2.7/site-packages/six.py
/usr/lib/python2.7/site-packages/six.pyc
/usr/lib/python2.7/site-packages/six.pyo
/usr/share/doc/python-six-1.3.0
/usr/share/doc/python-six-1.3.0/LICENSE
/usr/share/doc/python-six-1.3.0/README
/usr/share/doc/python-six-1.3.0/index.rst
解决吗
做了以下事情:
$ wget http://bitbucket.org/ianb/virtualenv/raw/tip/virtualenv.py
$ python virtualenv.py ~/venv/base
$ echo 'source ~/venv/base/bin/activate' >> ~/.bash_profile
退出Gnome会话并重新登录。
$ pip install --user scrapy
$ scrapy
现在运行正常。
已解决X2 见下文
答案 0 :(得分:0)
<强>解决强>
卸载scrapy:
$ sudo pip uninstall scrapy
从source ~/venv/base/bin/activate
删除~/.bash_profile
。
现在出现以下错误:
$ pip install --user scrapy
The temporary folder for building (/tmp/pip-build-dave) is not owned by your user!
pip will not work until the temporary folder is either deleted or owned by your user account.
Traceback (most recent call last):
File "/usr/bin/pip", line 9, in <module>
load_entry_point('pip==1.3.1', 'console_scripts', 'pip')()
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 351, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2363, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2088, in load
File "/usr/lib/python2.7/site-packages/pip/__init__.py", line 9, in <module>
from pip.util import get_installed_distributions, get_prog
File "/usr/lib/python2.7/site-packages/pip/util.py", line 15, in <module>
from pip.locations import site_packages, running_under_virtualenv, virtualenv_no_global
File "/usr/lib/python2.7/site-packages/pip/locations.py", line 64, in <module>
build_prefix = _get_build_prefix()
File "/usr/lib/python2.7/site-packages/pip/locations.py", line 54, in _get_build_prefix
raise pip.exceptions.InstallationError(msg)
pip.exceptions.InstallationError: The temporary folder for building (/tmp/pip-build-dave) is not owned by your user!
所以......
$ sudo rm -rf /tmp/pip-build-dave
...
$ pip install --user scrapy
$ scrapy
现在有效!顺便说一句,感谢funkode上的 #python 中的cdunklau和\ u03b5! =)