PS C:\ users \ steve \ tutorial> scrapy crawl dmoz
Traceback (most recent call last):
File "c:\python27\scripts\scrapy-script.py", line 9, in <module>
load_entry_point('scrapy==1.0.3', 'console_scripts', 'scrapy')()
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\cmdline.py",
cmd.crawler_process = CrawlerProcess(settings)
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\crawler.py",
super(CrawlerProcess, self).__init__(settings)
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\crawler.py",
self.spider_loader = _get_spider_loader(settings)
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\crawler.py",
return loader_cls.from_settings(settings.frozencopy())
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\spiderloader.
return cls(settings)
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\spiderloader.
for module in walk_modules(name):
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\utils\misc.py
submod = import_module(fullpath)
File "C:\Python27\lib\importlib\__init__.py", line 37, in import_module
__import__(name)
File "C:\users\steve\tutorial\tutorial\spiders\dmoz.py", line 4, in <module>
class dmozspider(spiders):
TypeError: Error when calling the metaclass bases module.__init__() takes at most 2 arguments (3 given)
我的dmoz spider python脚本在这里
from scrapy import spiders
class dmozspider(spiders):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
f.write(response.body)
答案 0 :(得分:0)
问题是您正在导入“蜘蛛”,并将其用作基类。 “蜘蛛”是包含蜘蛛的包,即Spider
类。要使用它,请使用:
from scrapy.spiders import Spider
class dmozspider(Spider):
... # Rest of your code