这是我的代码:
from scrapy import *
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
class lala(CrawlSpider):
name="lala"
start_url=["http://www.lala.net/"]
rule = [Rule(SgmlLinkExtractor(), follow=True, callback='self.parse')]
def __init__(self):
super(lala, self).__init__(self)
print "\nworking\n"
def parse(self,response):
print "\n\n Middle \n"
print "\nend\n"
问题是:
UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
请注意,在这种情况下,end
和working
都会打印出来。
如果我删除了init,那么没有错误但是没有调用解析,因为没有打印中间的msg。
答案 0 :(得分:2)
在使用self
调用继承的__init__()
方法时,
}
super()
查看example listed in the documentation,应该将该属性称为def __init__(self):
super(lala, self).__init__()
,而不是rules
:
rule
答案 1 :(得分:1)
scrapy文档explicitly warns against using a CrawlSpider and overridding the parse method。
尝试将parse
方法重命名为parse_item
,然后重试。