我想抓取连续的页面,所以如果当前页面有下一页,我会使用scrapy产生新的Request,但我发现它没有调用Request函数。这是我的代码和结果:
下一页网址:http://www.wowsai.com/index.php?app=store&act=credit&id=682376&page=2#module
2015-05-06 10:00:47 + 0800 [spider22]信息:关闭蜘蛛(已完成)
def parse(self, response):
...
#2.get the next page url and trigger another request and par
if "page" not in response.url:
nextpage_url = 'ht tp://www.wowsai.com/'+sel.xpath('//div[@id="pageBox"]/a[1]/@href').extract()[0]
else:
nextpage_url = 'htt p://www.wowsai.com/'+sel.xpath('//div[@id="pageBox"]/a[2]/@href').extract()[0]
print "next page url:", nextpage_url
yield Request(nextpage_url, callback=self.parsePage)
def parsePage(self, response):
print response.url
print "here is parsePage"