我不确定我在这里有3个争论,如果是这样,我如何从items.py调用DmozItem?这似乎是一个我遗漏的简单继承问题。此代码直接从scrapy教程网站复制。
- Shell错误 -
SyntaxError: invalid syntax
PS C:\Users\Steve\tutorial> scrapy crawl dmoz
Traceback (most recent call last):
File "c:\python27\scripts\scrapy-script.py", line 9, in <module>
load_entry_point('scrapy==1.0.3', 'console_scripts', 'scrapy')()
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\cmdline.py", line 142, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\crawler.py", line 209, in __init__
super(CrawlerProcess, self).__init__(settings)
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\crawler.py", line 115, in __init__
self.spider_loader = _get_spider_loader(settings)
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\crawler.py", line 296, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\spiderloader.py", line 30, in from_settings
return cls(settings)
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\spiderloader.py", line 21, in __init__
for module in walk_modules(name):
File "C:\Python27\lib\site-packages\scrapy-1.0.3-py2.7.egg\scrapy\utils\misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "C:\Python27\lib\importlib\__init__.py", line 37, in import_module
__import__(name)
File "C:\Users\Steve\tutorial\tutorial\spiders\dmoz_spider.py", line 3, in <module>
from tutorial.items import DmozItem
File "C:\Users\Steve\tutorial\tutorial\items.py", line 11, in <module>
class DmozItem(scrapy.item):
TypeError: Error when calling the metaclass bases
module._init_() takes at most 2 arguments (3 given)
- items.py - 我的解析项目列表
import scrapy
class DmozItem(scrapy.item):
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()
- dmoz_spider.py - 这是蜘蛛
import scrapy
from tutorial.items import DmozItem
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"https://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"https://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
for sel in response.xpath('//ul/li'):
item = DmozItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
item['desc'] = sel.xpath('text()').extract()
yield item
答案 0 :(得分:1)
您输错了 scrapy.Item 类名。
在 items.py
中,更改:
scrapy.item
到
scrapy.Item
它应该是这样的:
import scrapy
class DmozItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()