部署scrapy项目

时间:2013-01-19 12:12:22

标签: python deployment scrapy scrapyd

我正在尝试使用scrapyd部署scrapy项目。我可以通过使用

正常运行我的项目
cd /var/www/api/scrapy/dirbot
scrapy crawl dmoz

这是我一步一步:

1 /我跑

scrapy version -v
>> Scrapy  : 0.16.3
lxml    : 3.0.2.0
libxml2 : 2.7.8
Twisted : 12.2.0
Python  : 2.7.3 (default, Aug  1 2012, 05:14:39) - [GCC 4.6.3]
Platform: Linux-3.2.0-31-virtual-x86_64-with-Ubuntu-12.04-precise

2 /使用

安装scrapyd
aptitude install scrapyd-0.16

3 /我在/ var / www / api / scrapy / dirbot(http://domain.com/api/scrapy/dirbot)进行项目扫描。我编辑scrapy.cfg

[settings]
default = dirbot.settings
[deploy:scrapyd2]
url = http://domain.com/api/scrapy/dirbot/
username = vu
password = hoang

4 / I使用deploy命令进行测试

scrapy deploy -l
>> scrapyd2             http://domain.com/api/scrapy/dirbot/

5 /但是当我使用命令

scrapy deploy -L scrapyd2
>> /usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/settings/deprecated.py:23: ScrapyDeprecationWarning: You are using the following settings which are deprecated or obsolete (ask scrapy-users@googlegroups.com for alternatives):
    BOT_VERSION: no longer used (user agent defaults to Scrapy now)
  warnings.warn(msg, ScrapyDeprecationWarning)
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 5, in <module>
    pkg_resources.run_script('Scrapy==0.16.3', 'scrapy')
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 499, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1235, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
    execute()
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/cmdline.py", line 131, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/cmdline.py", line 76, in _run_print_help
    func(*a, **kw)
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/cmdline.py", line 138, in _run_command
    cmd.run(args, opts)
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/commands/deploy.py", line 76, in run
    f = urllib2.urlopen(req)
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

scrapy deploy scrapyd2 -p project
>> /usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/settings/d           eprecated.py:23: ScrapyDeprecationWarning: You are using the following settings            which are deprecated or obsolete (ask scrapy-users@googlegroups.com for alternat           ives):
    BOT_VERSION: no longer used (user agent defaults to Scrapy now)
  warnings.warn(msg, ScrapyDeprecationWarning)
Building egg of project-1358597244
'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...
Deploying project-1358597244 to http://domain.com/api/scrapy/dirbot/addversio           n.json
Deploy failed (404):
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /api/scrapy/dirbot/addversion.json was not found on this se           rver.</p>
<hr>
<address>Apache/2.2.22 (Ubuntu) Server at domain.com Port 80</address>
</body></html>

*我不明白什么是python egg。你能举个例子吗?我不知道我有没有。也许是那个文件/var/www/api/scrapy/dirbot/setup.py?

from setuptools import setup, find_packages

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = dirbot.settings']},
)

*如何部署我的项目。我不知道我做错了什么,或者错过了一步?

由于

1 个答案:

答案 0 :(得分:1)

从错误中看,您想要抓取的网站似乎给出了404错误,要么您放错了网站,要么存在一些配置错误。

关于python egg,有一个很好的答案,请在What is a Python egg?

查看

关于setup.py:我知道它用于使用命令python setup.py install

从源代码安装python应用程序

编辑:似乎我对命令pip感到困惑,对不起