/ usr / bin / scrapy无法启动

时间:2014-04-10 23:54:50

标签: python scrapy fedora yum

Scrapy将不再开始。 yum今天更新了,它已经不再有效了。尝试删除scrapy并使用pip重新安装但没有成功。

Scrapy 0.22.2

$ uname -a    
Linux localhost.localdomain 3.13.9-100.fc19.x86_64 #1 SMP Fri Apr 4 00:51:59 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ python -V
Python 2.7.5

$ scrapy
Traceback (most recent call last):
  File "/usr/bin/scrapy", line 4, in <module>
    execute()
  File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 122, in execute
    cmds = _get_commands_dict(settings, inproject)
  File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 46, in _get_commands_dict
    cmds = _get_commands_from_module('scrapy.commands', inproject)
  File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 29, in _get_commands_from_module
    for cmd in _iter_command_classes(module):
  File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 20, in _iter_command_classes
    for module in walk_modules(module_name):
  File "/usr/lib/python2.7/site-packages/scrapy/utils/misc.py", line 68, in walk_modules
    submod = import_module(fullpath)
  File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/usr/lib/python2.7/site-packages/scrapy/commands/deploy.py", line 14, in <module>
    from w3lib.form import encode_multipart
  File "/usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/form.py", line 3, in <module>
    if six.PY2:
AttributeError: 'module' object has no attribute 'PY2'

six.PY2?

$ python
Python 2.7.5 (default, Nov 12 2013, 16:18:42) 
[GCC 4.8.2 20131017 (Red Hat 4.8.2-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import six
>>> six.PY2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'PY2'
>>> six.PY3
False
>>>

从/usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/form.py中删除对“six.PY2”的引用然后就可以启动。

#import six
#if six.PY2:
#    from cStringIO import StringIO as BytesIO
#else:
#    from io import BytesIO

import six
from cStringIO import StringIO as BytesIO

但是,然后尝试运行scrapy crawl MySpider失败:

$ scrapy crawl MySpider
Starting domain scrape
2014-04-10 19:43:39-0400 [scrapy] INFO: Scrapy 0.22.2 started (bot: scrapy_myspider)
2014-04-10 19:43:39-0400 [scrapy] INFO: Optional features available: ssl, http11, boto
2014-04-10 19:43:39-0400 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'scrapy_myspider.spiders', 'CLOSESPIDER_TIMEOUT': 40, 'SPIDER_MODULES': ['scrapy_myspider.spiders'], 'LOG_LEVEL': 'INFO', 'RETRY_ENABLED': False, 'HTTPCACHE_DIR': '/tmp/scrapy_cache', 'HTTPCACHE_ENABLED': True, 'RETRY_TIMES': 1, 'BOT_NAME': 'scrapy_myspider', 'AJAXCRAWL_ENABLED': True, 'CONCURRENT_ITEMS': 400, 'COOKIES_ENABLED': False, 'DOWNLOAD_TIMEOUT': 14}
2014-04-10 19:43:40-0400 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, DefaultHeadersMiddleware, AjaxCrawlMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, ChunkedTransferMiddleware, DownloaderStats, HttpCacheMiddleware
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled item pipelines: 
2014-04-10 19:43:41-0400 [MySpider] INFO: Spider opened
2014-04-10 19:43:41-0400 [MySpider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-04-10 19:43:41-0400 [MySpider] ERROR: Obtaining request from start requests
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1192, in run
        self.mainLoop()
      File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1201, in mainLoop
        self.runUntilCurrent()
      File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/usr/lib/python2.7/site-packages/scrapy/utils/reactor.py", line 41, in __call__
        return self._func(*self._a, **self._kw)
    --- <exception caught here> ---
      File "/usr/lib/python2.7/site-packages/scrapy/core/engine.py", line 111, in _next_request
        request = next(slot.start_requests)
      File "/usr/lib/python2.7/site-packages/scrapy/spider.py", line 50, in start_requests
        yield self.make_requests_from_url(url)
      File "/usr/lib/python2.7/site-packages/scrapy/spider.py", line 53, in make_requests_from_url
        return Request(url, dont_filter=True)
      File "/usr/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 26, in __init__
        self._set_url(url)
      File "/usr/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 52, in _set_url
        self._url = escape_ajax(safe_url_string(url))
      File "/usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/url.py", line 52, in safe_url_string
        return moves.urllib.parse.quote(s, _safe_chars)
    exceptions.AttributeError: '_MovedItems' object has no attribute 'urllib'

2014-04-10 19:43:41-0400 [MySpider] INFO: Closing spider (finished)
2014-04-10 19:43:41-0400 [MySpider] INFO: Dumping Scrapy stats:
    {'finish_reason': 'finished',
     'finish_time': datetime.datetime(2014, 4, 10, 23, 43, 41, 120645),
     'log_count/ERROR': 1,
     'log_count/INFO': 7,
     'start_time': datetime.datetime(2014, 4, 10, 23, 43, 41, 114721)}
2014-04-10 19:43:41-0400 [MySpider] INFO: Spider closed (finished)

任何想法从哪里开始?问题yum update,python? :d

其他信息

$ pip freeze
...
six==1.6.1
...

$ python
>>> import six
>>> six.__file__
/usr/lib/python2.7/site-packages/six.pyc

$ yumdb info python-six
Loaded plugins: langpacks, refresh-packagekit
python-six-1.3.0-1.fc19.noarch
     checksum_data = [**redacted**]
     checksum_type = sha256
     command_line = install python-bugzilla python-requests python-urllib3 python-six
     from_repo = fedora
     from_repo_revision = 1372417620
     from_repo_timestamp = 1372419845
     installed_by = 1000
     origin_url = http://fedora.mirror.constant.com/linux/releases/19/Everything/x86_64/os/Packages/p/python-six-1.3.0-1.fc19.noarch.rpm
     reason = user
     releasever = 19
     var_uuid = b2714b4a-0654-4c5c-8405-80724410fdde

$ yum info python-six
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
Name        : python-six
Arch        : noarch
Version     : 1.3.0
Release     : 1.fc19
Size        : 50 k
Repo        : installed
From repo   : fedora
Summary     : Python 2 and 3 compatibility utilities
URL         : http://pypi.python.org/pypi/six/
License     : MIT
Description : python-six provides simple utilities for wrapping over differences between
            : Python 2 and Python 3.
            : 
            : This is the Python 2 build of the module.

更多信息

$ repoquery -lq python-six
/usr/lib/python2.7/site-packages/six-1.3.0-py2.7.egg-info
/usr/lib/python2.7/site-packages/six.py
/usr/lib/python2.7/site-packages/six.pyc
/usr/lib/python2.7/site-packages/six.pyo
/usr/share/doc/python-six-1.3.0
/usr/share/doc/python-six-1.3.0/LICENSE
/usr/share/doc/python-six-1.3.0/README
/usr/share/doc/python-six-1.3.0/index.rst

解决吗

做了以下事情:

$ wget http://bitbucket.org/ianb/virtualenv/raw/tip/virtualenv.py
$ python virtualenv.py ~/venv/base
$ echo 'source ~/venv/base/bin/activate' >> ~/.bash_profile

退出Gnome会话并重新登录。

$ pip install --user scrapy

$ scrapy现在运行正常。

已解决X2 见下文

1 个答案:

答案 0 :(得分:0)

<强>解决

卸载scrapy:

$ sudo pip uninstall scrapy

source ~/venv/base/bin/activate删除~/.bash_profile

现在出现以下错误:

$ pip install --user scrapy
The temporary folder for building (/tmp/pip-build-dave) is not owned by your user!
pip will not work until the temporary folder is either deleted or owned by your user account.
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    load_entry_point('pip==1.3.1', 'console_scripts', 'pip')()
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 351, in load_entry_point
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2363, in load_entry_point
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2088, in load
  File "/usr/lib/python2.7/site-packages/pip/__init__.py", line 9, in <module>
    from pip.util import get_installed_distributions, get_prog
  File "/usr/lib/python2.7/site-packages/pip/util.py", line 15, in <module>
    from pip.locations import site_packages, running_under_virtualenv, virtualenv_no_global
  File "/usr/lib/python2.7/site-packages/pip/locations.py", line 64, in <module>
    build_prefix = _get_build_prefix()
  File "/usr/lib/python2.7/site-packages/pip/locations.py", line 54, in _get_build_prefix
    raise pip.exceptions.InstallationError(msg)
pip.exceptions.InstallationError: The temporary folder for building (/tmp/pip-build-dave) is not owned by your user!

所以......

$ sudo rm -rf /tmp/pip-build-dave

...

$ pip install --user scrapy
$ scrapy

现在有效!顺便说一句,感谢funkode上的 #python 中的cdunklau和\ u03b5! =)