我知道这是一个新手问题,这是一个基本的Python问题,但它是在Scrapy的背景下,我无法在任何地方找到答案。
当我运行此机器人代码时:
import scrapy
from tutorial.items import DmozItem
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["lib-web.org"]
start_urls = [
"http://www.lib-web.org/united-states/public-libraries/michigan/"
]
count = 0
def increment(self):
global count
count += 1
def getCount(self):
global count
return count
def parse(self, response):
increment()
for sel in response.xpath('//div/div/div/ul/li'):
item = DmozItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
item['desc'] = sel.xpath('p/text()').extract()
x = getCount()
print x
yield item
DmozItem:
import scrapy
class DmozItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()
我收到此错误:
File "/Users/Admin/scpy_projs/tutorial/tutorial/spiders/dmoz_spider.py", line 23, in parse
increment()
NameError: global name 'increment' is not defined
为什么我无法在increment()
内拨打parse(self, response)
?我怎样才能做到这一点?
感谢您的帮助。
答案 0 :(得分:7)
increment()
是您的蜘蛛的实例方法 - 使用self.increment()
来调用它。
此外,不需要使用全局变量 - 将count()
定义为实例变量。
修正版:
import scrapy
from tutorial.items import DmozItem
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["lib-web.org"]
start_urls = [
"http://www.lib-web.org/united-states/public-libraries/michigan/"
]
def __init__(self, *args, **kwargs):
super(DmozSpider, self).__init__(*args, **kwargs)
self.count = 0
def increment(self):
self.count += 1
def getCount(self):
return self.count
def parse(self, response):
self.increment()
for sel in response.xpath('//div/div/div/ul/li'):
item = DmozItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
item['desc'] = sel.xpath('p/text()').extract()
x = self.getCount()
print x
yield item