我问了一个新问题因为我知道在最后一个问题上我不够清楚。 我正在尝试遵循scrapy教程,但我陷入了关键步骤,即“scrapy crawl dmoz”命令。 代码就是这个(我在python shell中编写了这个代码并保存它,输入.py扩展名):
ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
Python 2.7.2 (default, Jun 24 2011, 12:20:15)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "copyright", "credits" or "license()" for more information.
>>> from scrapy.spider import BaseSpider
class dmoz(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)
>>>
我正在使用的目录应该没问题,请在树下面找到:
.
├── scrapy.cfg
└── tutorial
├── __init__.py
├── __init__.pyc
├── items.py
├── pipelines.py
├── settings.py
├── settings.pyc
└── spiders
├── __init__.py
├── __init__.pyc
└── dmoz_spider.py
2 directories, 10 files
现在,当我尝试运行“scapy crawl dmoz”时,我得到了这个:
$ scrapy crawl dmoz
2013-08-14 12:51:40+0200 [scrapy] INFO: Scrapy 0.16.5 started (bot: tutorial)
2013-08-14 12:51:40+0200 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/scrapy", line 5, in <module>
pkg_resources.run_script('Scrapy==0.16.5', 'scrapy')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources.py", line 499, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources.py", line 1235, in run_script
execfile(script_filename, namespace, namespace)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
execute()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/cmdline.py", line 131, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/cmdline.py", line 76, in _run_print_help
func(*a, **kw)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/cmdline.py", line 138, in _run_command
cmd.run(args, opts)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/commands/crawl.py", line 43, in run
spider = self.crawler.spiders.create(spname, **opts.spargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/command.py", line 33, in crawler
self._crawler.configure()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/crawler.py", line 40, in configure
self.spiders = spman_cls.from_crawler(self)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/spidermanager.py", line 35, in from_crawler
sm = cls.from_settings(crawler.settings)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/spidermanager.py", line 31, in from_settings
return cls(settings.getlist('SPIDER_MODULES'))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/spidermanager.py", line 22, in __init__
for module in walk_modules(name):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.5-py2.7.egg/scrapy/utils/misc.py", line 65, in walk_modules
submod = __import__(fullpath, {}, {}, [''])
File "/Users//Documents/tutorial/tutorial/spiders/dmoz_spider.py", line 1
ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
^
SyntaxError: invalid syntax
有人知道我正在制作的步骤有什么问题吗? 谢谢您的帮助。这是我的第一次编程经验,所以这可能是一个非常愚蠢的问题。
答案 0 :(得分:1)
这不是缩进问题,错误信息很明确:
File "/Users//Documents/tutorial/tutorial/spiders/dmoz_spider.py", line 1
ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
^
SyntaxError: invalid syntax
你显然已经在IDLE中粘贴了代码,包括从IDLE开始的字符串,这些代码都没有代码。
尝试打开一个编辑器,然后在那里输入教程代码,而不是复制粘贴,你会学得更好,而且你不会在不经意间粘贴垃圾。
答案 1 :(得分:0)
缩进不正确。 它应该是:
>>>from scrapy.spider import BaseSpider
>>>class dmoz(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)
我认为你复制粘贴了IDLE中的代码,请缩进课程。