Scrapyd找不到项目名称

时间:2014-04-01 15:56:50

标签: scrapy scrapyd

当我尝试在scrapyd上运行现有的scrapy项目时出现错误。

我有一个正在运行的scrapy项目(url_finder)和一个用于测试目的的项目中的工作蜘蛛(test_ip_spider_1x),只需下载whatismyip.com。

我成功安装了scrapyd(使用apt-get),现在我想在scrapyd上运行spider。所以我执行:

curl http://localhost:6800/schedule.json -d project=url_finder -d spider=test_ip_spider_1x

返回:

{"status": "error", "message": "'url_finder'"}

这似乎表明该项目存在问题。但是当我执行时:scrapy crawl test_ip_spider_1x 一切都运行良好。 当我在Web界面中检查scrapyd日志时,这就是我得到的:

2014-04-01 11:40:22-0400 [HTTPChannel,0,127.0.0.1] 127.0.0.1 - - [01/Apr/2014:15:40:21 +0000] "POST /schedule.json HTTP/1.1" 200 47 "-" "curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3"
2014-04-01 11:40:58-0400 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [01/Apr/2014:15:40:57 +0000] "GET / HTTP/1.1" 200 747 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36"
2014-04-01 11:41:01-0400 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [01/Apr/2014:15:41:00 +0000] "GET /logs/ HTTP/1.1" 200 1203 "http://localhost:6800/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36"
2014-04-01 11:41:03-0400 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [01/Apr/2014:15:41:02 +0000] "GET /logs/scrapyd.log HTTP/1.1" 200 36938 "http://localhost:6800/logs/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36"
2014-04-01 11:42:02-0400 [HTTPChannel,2,127.0.0.1] Unhandled Error
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/twisted/web/http.py", line 1730, in allContentReceived
        req.requestReceived(command, path, version)
      File "/usr/local/lib/python2.7/dist-packages/twisted/web/http.py", line 826, in requestReceived
        self.process()
      File "/usr/local/lib/python2.7/dist-packages/twisted/web/server.py", line 189, in process
        self.render(resrc)
      File "/usr/local/lib/python2.7/dist-packages/twisted/web/server.py", line 238, in render
        body = resrc.render(self)
    --- <exception caught here> ---
      File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 18, in render
        return JsonResource.render(self, txrequest)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py", line 10, in render
        r = resource.Resource.render(self, txrequest)
      File "/usr/local/lib/python2.7/dist-packages/twisted/web/resource.py", line 250, in render
        return m(request)
      File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 37, in render_POST
        self.root.scheduler.schedule(project, spider, **args)
      File "/usr/lib/pymodules/python2.7/scrapyd/scheduler.py", line 15, in schedule
        q = self.queues[project]
    exceptions.KeyError: 'url_finder'

2014-04-01 11:42:02-0400 [HTTPChannel,2,127.0.0.1] 127.0.0.1 - - [01/Apr/2014:15:42:01 +0000] "POST /schedule.json HTTP/1.1" 200 47 "-" "curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3"

有什么想法吗?

1 个答案:

答案 0 :(得分:9)

要运行scrapyd项目,必须先将其部署。这在网上的文档中没有得到很好的解释(特别是对于初次使用的用户)。这是一个对我有用的解决方案:

安装scrapyd-deploy:如果你有Ubuntu或类似的,你可以运行:

apt-get install scrapyd-deploy

在你的scrapy项目文件夹中编辑scrapy.cfg并取消注释该行

 url = http://localhost:6800/

这是您的部署目标 - scrapy将在此位置部署项目。 接下来,检查以确保scrapyd可以看到部署目标:

scrapyd-deploy -l

这应输出类似于:

的内容
default http://localhost:6800/

接下来,您可以部署项目(url_finder):

scrapyd-deploy default -p url_finder

最后运行蜘蛛:

curl http://localhost:6800/schedule.json -d project=url_finder -d spider=test_ip_spider_1x