无法格式化的对象错误

时间:2013-04-09 13:15:53

标签: python python-2.7 scrapy

这是我的代码:

from scrapy import * 
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector 
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor

class lala(CrawlSpider):
    name="lala"
    start_url=["http://www.lala.net/"]       
    rule = [Rule(SgmlLinkExtractor(), follow=True, callback='self.parse')] 

    def __init__(self):
        super(lala, self).__init__(self)    
        print "\nworking\n"

    def parse(self,response):        
        print "\n\n Middle \n"  

print "\nend\n"

问题是:

UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST

请注意,在这种情况下,endworking都会打印出来。

如果我删除了init,那么没有错误但是没有调用解析,因为没有打印中间的msg。

2 个答案:

答案 0 :(得分:2)

在使用self调用继承的__init__()方法时,

}

super()

查看example listed in the documentation,应该将该属性称为def __init__(self): super(lala, self).__init__() ,而不是rules

rule

答案 1 :(得分:1)

scrapy文档explicitly warns against using a CrawlSpider and overridding the parse method

尝试将parse方法重命名为parse_item,然后重试。