Myspider对象没有属性parse_item

时间:2015-07-16 17:06:51

标签: python web-crawler scrapy attributeerror scrapy-spider

我是scrapy的新手,我有一个基础蜘蛛,类似于下面的例子:

class MySpider(scrapy.Spider):
    name = 'myspider'
    allowed_domains = ['example.com'] #the domain where the spider is allowed to crawl
    start_urls = ['http://www.example.com/content/'] #url from which the spider will start crawling
    page_incr = 1
    flag = 0

    def parse(self, response):
            sel=Selector(response)
            stuffs = sel.xpath('//a/@href')
            for stuff in stuffs:
                link = stuff.extract()
                req1 = Request(url=link, callback=self.parse_item)
                yield req1

            url = 'http://www.example.com/content/?q=ajax//date/%d&page=%d' % (self.page_incr, self.page_incr)
            req2 = Request(url=url,
                          headers={"Referer": "http://www.example.com/content", "X-Requested-With": "XMLHttpRequest"},
                          callback=self.parse_xhr)
            yield req2

    def parse_xhr(self, response):
            sel=Selector(response)
            stuffs = sel.xpath('//a/@href')
            for stuff in stuffs:
                link = stuff.extract()
                yield Request(url=link, callback=self.parse_item)

            content = sel.xpath('//a/@href').extract()
            if content == []:
                self.flag +=1
                if self.flag == 5:
                    raise CloseSpider('WARNING: <Spider forced to stop>')
            else:
                self.flag = 0

            self.page_incr +=1
            url = 'http://www.example.com/content/?q=ajax//date/%d&page=%d' % (self.page_incr, self.page_incr)
            req3 = Request(url=url,
                      headers={"Referer": "http://www.example.com/content", "X-Requested-With": "XMLHttpRequest"},
                      callback=self.parse_xhr)
            yield req3

     def parse_item(self, response):
            pass

当我尝试将其设置为抓取时出现错误,请执行以下操作:

line 24, in parse
        req1 = Request(url=link, callback=self.parse_item)
    exceptions.AttributeError: 'MySpider' object has no attribute 'parse_item'

我没有得到它......请帮我看看有什么不对! 感谢您的时间和帮助。

1 个答案:

答案 0 :(得分:1)

您的parse_item()方法拼写错误(有5个空格而不是4个)。