我是scrapy的新手,我有一个基础蜘蛛,类似于下面的例子:
class MySpider(scrapy.Spider):
name = 'myspider'
allowed_domains = ['example.com'] #the domain where the spider is allowed to crawl
start_urls = ['http://www.example.com/content/'] #url from which the spider will start crawling
page_incr = 1
flag = 0
def parse(self, response):
sel=Selector(response)
stuffs = sel.xpath('//a/@href')
for stuff in stuffs:
link = stuff.extract()
req1 = Request(url=link, callback=self.parse_item)
yield req1
url = 'http://www.example.com/content/?q=ajax//date/%d&page=%d' % (self.page_incr, self.page_incr)
req2 = Request(url=url,
headers={"Referer": "http://www.example.com/content", "X-Requested-With": "XMLHttpRequest"},
callback=self.parse_xhr)
yield req2
def parse_xhr(self, response):
sel=Selector(response)
stuffs = sel.xpath('//a/@href')
for stuff in stuffs:
link = stuff.extract()
yield Request(url=link, callback=self.parse_item)
content = sel.xpath('//a/@href').extract()
if content == []:
self.flag +=1
if self.flag == 5:
raise CloseSpider('WARNING: <Spider forced to stop>')
else:
self.flag = 0
self.page_incr +=1
url = 'http://www.example.com/content/?q=ajax//date/%d&page=%d' % (self.page_incr, self.page_incr)
req3 = Request(url=url,
headers={"Referer": "http://www.example.com/content", "X-Requested-With": "XMLHttpRequest"},
callback=self.parse_xhr)
yield req3
def parse_item(self, response):
pass
当我尝试将其设置为抓取时出现错误,请执行以下操作:
line 24, in parse
req1 = Request(url=link, callback=self.parse_item)
exceptions.AttributeError: 'MySpider' object has no attribute 'parse_item'
我没有得到它......请帮我看看有什么不对! 感谢您的时间和帮助。
答案 0 :(得分:1)
您的parse_item()
方法拼写错误(有5个空格而不是4个)。