我正在按照本教程https://www.practicalecommerce.com/Monitor-Competitor-Prices-with-Python-and-Scrapy详细说明如何逐步说明,但当我到达使用命令运行蜘蛛的部分时:
scrapy crawl massEffect -o results.csv
它显示了这个错误:
NameError:未定义全局名称'TfawItem'
我做错了什么?
这是我的items.py:
# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class TfawItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
price = scrapy.Field()
upc = scrapy.Field()
url = scrapy.Field()
我的massEffect.py:
# -*- coding: utf-8 -*-
import scrapy
class MasseffectSpider(scrapy.Spider):
name = 'massEffect'
allowed_domains = ['tfaw.com']
start_urls = [
'http://www.tfaw.com/Companies/Dark-Horse/Series?series_name=Mass+Effect',
]
def parse(self, response):
for href in response.css('div a.boldlink::attr(href)'):
url = response.urljoin(href.extract())
yield scrapy.Request(url, callback=self.parse_detail_page)
def parse_detail_page(self, response):
comic = TfawItem()
comic['title'] = response.css('div.iconistan + b span.blackheader::text').extract()
comic['price'] = response.css('span.blackheader ~ span.redheader::text').re('[$]\d+\.\d+')
comic['upc'] = response.xpath('/html/body/table[1]/tr/td[4]/table[3]/tr/td/table/tr/td[contains(., "UPC:")]/following-sibling::td[1]/text()').extract()
comic['url'] = response.url
yield comic
我项目的层次结构:
tfaw/
scrapy.cfg
results.csv
tfaw/
__init__.py
__init__.pyc
items.py
middlewares.py
pipelines.py
settings.py
settings.pyc
spiders/
__init__.py
__init__.pyc
massEffect.py
massEffect.pyc
答案 0 :(得分:0)
您没有将TfawItem
导入massEffect.py
文件。根据您的python版本,您可以执行以下任一操作:
from ..items import TfawItem
或
from modulename.items import TfawItem
答案 1 :(得分:0)
您不会在TfawItem
中的任何位置导入massEffect.py
。
将from ..items import TfawItem
添加到massEffect.py
的顶部。