在Scrapy bot中,如何从另一个函数中调用一个函数?

时间:2015-11-13 18:35:28

标签: python python-2.7 web-scraping scrapy

我知道这是一个新手问题,这是一个基本的Python问题,但它是在Scrapy的背景下,我无法在任何地方找到答案。

当我运行此机器人代码时:

import scrapy

from tutorial.items import DmozItem

class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["lib-web.org"]
    start_urls = [
        "http://www.lib-web.org/united-states/public-libraries/michigan/"
    ]

    count = 0

    def increment(self):
        global count
        count += 1

    def getCount(self):
        global count
        return count

    def parse(self, response):
        increment()
        for sel in response.xpath('//div/div/div/ul/li'):
            item = DmozItem()
            item['title'] = sel.xpath('a/text()').extract()
            item['link'] = sel.xpath('a/@href').extract()
            item['desc'] = sel.xpath('p/text()').extract()
            x = getCount()
            print x
            yield item

DmozItem:

import scrapy

class DmozItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    desc = scrapy.Field()

我收到此错误:

File "/Users/Admin/scpy_projs/tutorial/tutorial/spiders/dmoz_spider.py", line 23, in parse
    increment()
NameError: global name 'increment' is not defined

为什么我无法在increment()内拨打parse(self, response)?我怎样才能做到这一点?

感谢您的帮助。

1 个答案:

答案 0 :(得分:7)

increment()是您的蜘蛛的实例方法 - 使用self.increment()来调用它。

此外,不需要使用全局变量 - 将count()定义为实例变量。

修正版:

import scrapy

from tutorial.items import DmozItem

class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["lib-web.org"]
    start_urls = [
        "http://www.lib-web.org/united-states/public-libraries/michigan/"
    ]

    def __init__(self,  *args, **kwargs):
        super(DmozSpider, self).__init__(*args, **kwargs)

        self.count = 0

    def increment(self):
        self.count += 1

    def getCount(self):
        return self.count

    def parse(self, response):
        self.increment()

        for sel in response.xpath('//div/div/div/ul/li'):
            item = DmozItem()
            item['title'] = sel.xpath('a/text()').extract()
            item['link'] = sel.xpath('a/@href').extract()
            item['desc'] = sel.xpath('p/text()').extract()
            x = self.getCount()
            print x

            yield item

您还可以define count as a property.